This notebook presents a comprehensive framework for validating Surface Water and Ocean Topography (SWOT) satellite altimetry data using in-situ measurements from the BioSWOT-Med campaign in the Northwestern Mediterranean Sea. The SWOT mission, launched in December 2022 as a collaboration between NASA and CNES, represents a significant advancement in oceanographic remote sensing with its innovative wide-swath interferometric altimetry technology.
The BioSWOT-Med campaign, conducted in the Northwestern Mediterranean Sea, employed a Moving Vessel Profiler (MVP) to collect high-resolution temperature and salinity profiles along multiple transects. These measurements coincided with SWOT overpasses, creating an ideal dataset for satellite validation. The dynamic height calculated from MVP density profiles provides a direct in-situ counterpart to SWOT's satellite-measured sea surface height.
The in-situ MVP data used in this analysis was acquired by the Mediterranean Institute of Oceanography (MIO) in Marseille, France. This notebook represents the results of my research internship at MIO and consists of four complementary code sections, each addressing specific aspects of SWOT validation. While you can navigate directly to any section of interest, reading them in sequence is recommended for a comprehensive understanding of the validation framework:
Validation Pipeline: Implements the complete processing chain from raw data to comparative visualization, producing maps and profiles comparing MVP dynamic height with SWOT Absolute Dynamic Topography (ADT). This section answers: How well do SWOT-derived sea surface height and geostrophic velocities match the in-situ measurements?
Reference Level Optimization: Systematically tests different reference depths for dynamic height calculations to identify the optimal "level of no motion." This section addresses: What reference depth yields the best agreement between MVP dynamic height and SWOT altimetry?
Statistical Validation: Provides quantitative metrics (correlation, RMSE, regression slope) to assess the improvement achieved with optimized reference levels. This section answers: How does optimizing the reference level improve the agreement between MVP and SWOT measurements?
Spectral Analysis: Examines the power spectral density and coherence of MVP and satellite measurements across different spatial scales. This section addresses: At what spatial scales does SWOT accurately capture oceanographic features, and how does it compare to conventional altimetry?
All visualizations are generated at the end of each code section. For a more detailed discussion and interpretation of results, please refer to the complete thesis available on my GitHub repository.
The validation methods presented here are applicable to other regions and datasets, providing a robust framework for assessing satellite altimetry performance. By understanding the strengths and limitations of SWOT measurements across different spatial scales, particularly in the mesoscale-to-submesoscale transition region, we can better leverage this revolutionary altimetry technology for oceanographic research and applications.
This first section of the notebook implements a comprehensive workflow for validating Surface Water and Ocean Topography (SWOT) satellite altimetry data against in-situ measurements collected by the Moving Vessel Profiler (MVP).
This validation code:
The analysis focuses on multiple "PL" (plongée) transects, where the research vessel collected MVP profiles coincident with SWOT overpasses. For each transect, the code generates a suite of diagnostic visualizations that examine both the absolute and relative agreement between satellite and in-situ observations.
Understanding the correspondence between these different measurement systems is crucial for assessing SWOT's capability to resolve fine-scale oceanographic features like fronts and eddies, which are important drivers of ocean dynamics and marine ecosystems.
The first section of our code imports essential scientific computing libraries that provide the foundation for oceanographic data analysis:
gsw (Gibbs SeaWater): A Python implementation of the Thermodynamic Equation of Seawater 2010 (TEOS-10). This toolbox is crucial for accurately calculating thermodynamic properties of seawater, particularly density and derived quantities like dynamic height and geostrophic velocity.
numpy: The fundamental package for scientific computing in Python, which provides support for large multi-dimensional arrays, matrices, and mathematical functions to operate on these arrays.
scipy.io: A module from SciPy that provides functions for working with various file formats. In our context, it is primarily used for loading MATLAB (.mat) files containing MVP data through the loadmat() function.
These core libraries establish the computational foundation for our oceanographic analysis. The Gibbs SeaWater (gsw) toolbox is particularly important as it implements the internationally adopted TEOS-10 standard, which has replaced the previous EOS-80 equation of state in oceanographic practice. This ensures that our calculations of density, dynamic height, and geostrophic velocities adhere to the most accurate thermodynamic representation of seawater properties.
In oceanographic studies, the precise calculation of density from temperature and salinity measurements is critical, as even small errors in density can propagate into significant errors in derived quantities like geostrophic currents. The gsw library provides functions that account for the spatial variations in the composition of seawater, offering improvements over older methods that assumed constant composition.
## 1.1 Core scientific computing
import gsw # Gibbs Seawater toolbox
import numpy as np
import scipy.io as sio
Oceanographic data analysis requires specialized tools for handling geographic coordinates, calculating distances on Earth's surface, and creating map projections. The following libraries support these geospatial operations:
geopy.distance: A module from the GeoPy package that provides methods for calculating geodesic distances between points on Earth's surface. Unlike simple Euclidean distance calculations, geodesic distances account for Earth's curvature, which is essential for accurate distance measurements, especially over longer transects. In our analysis, this library will be used to calculate the cumulative distance along MVP sampling tracks.
mpl_toolkits.basemap: A toolkit for plotting data on map projections in matplotlib. Basemap transforms coordinates from geographic (longitude, latitude) to map projections (x, y), enabling the creation of publication-quality maps with customizable features such as coastlines, political boundaries, and filled continents. While newer libraries like Cartopy are gradually replacing Basemap, it remains a robust tool for creating oceanographic maps.
Together, these geospatial libraries enable precise distance calculations along ship transects and effective visualization of oceanographic data in their proper geographical context. These capabilities are particularly important for our analysis, as we need to accurately represent the spatial relationships between satellite data and in-situ measurements across the Northwestern Mediterranean basin.
## 1.2 Geospatial processing
import geopy.distance
from mpl_toolkits.basemap import Basemap
Effective data visualization is critical for oceanographic analysis, allowing us to interpret complex spatial patterns and compare different datasets. The core visualization library imported in this section is:
matplotlib.pyplot: The most widely used plotting library in the Python scientific ecosystem, providing a MATLAB-like interface for creating a wide variety of static, animated, and interactive visualizations. In our analysis, matplotlib (imported as plt) will be used to generate multiple plot types including:
Matplotlib offers extensive customization options that will allow us to create clear, informative visualizations that highlight the key oceanographic features of interest. We'll use specific color schemes (defined later in the code) to consistently represent different data sources: MVP measurements (red), SWOT-derived ADT velocities (blue), SWOT total velocities (green), and SWOT projected velocities (dark green).
Through these visualizations, we'll be able to qualitatively and quantitatively assess the agreement between satellite and in-situ observations, identify oceanographic features like eddies and fronts, and evaluate the performance of the SWOT mission in capturing fine-scale ocean dynamics.
## 1.3 Visualisation tools
import matplotlib.pyplot as plt
Oceanographic datasets come in various formats, each requiring specific libraries for effective handling. This section imports essential tools for data manipulation, file operations, and time series management:
pandas: A powerful data analysis library providing data structures like DataFrames that are ideal for tabular data manipulation. In our workflow, pandas will be used to organize MVP profiles, compute rolling averages for smoothing, and manage geospatial data along transects.
glob: A module that provides Unix-style pathname pattern expansion, allowing us to find files matching specific patterns. This is particularly useful for locating SWOT and DUACS data files that follow standard naming conventions but have variable date components.
os: The operating system interface module that provides functions for interacting with the file system, helping us manipulate file paths and extract file information.
datetime and timedelta: Classes from the standard library for manipulating dates and times. These are essential for handling the temporal aspects of oceanographic data, particularly for matching MVP transects with corresponding satellite overpasses.
xarray: A library designed for working with labeled, multi-dimensional arrays, particularly suited for handling NetCDF files common in geoscience. Xarray combines the capabilities of pandas and NumPy with the self-describing nature of NetCDF files, making it excellent for working with gridded satellite data.
netCDF4: A Python interface to the NetCDF C library, providing direct access to NetCDF files. While xarray builds on netCDF4 to provide a higher-level interface, this module gives us lower-level access when needed.
Together, these libraries provide a comprehensive toolkit for handling the diverse data formats and structures encountered in oceanographic analysis. They facilitate the integration of disparate data sources—from ship-based instruments to satellite observations—into a coherent analytical framework.
## 1.4 Data handling and I/O
import pandas as pd
import glob
import os
from datetime import datetime, timedelta
import xarray as xr
import netCDF4 as nc
Oceanographic measurements often come from irregularly spaced sampling points, while analysis and comparison require data on regular grids or at specific locations. The following interpolation and spatial analysis tools address this challenge:
scipy.interpolate.griddata: A versatile function for interpolating scattered data onto a regular grid or at arbitrary query points. This tool is crucial for our workflow as it allows us to:
Griddata supports multiple interpolation methods, including nearest-neighbor, linear, and cubic spline techniques, allowing us to select the appropriate method based on data density and the physical characteristics of the field being interpolated.
scipy.spatial.cKDTree: An optimized k-dimensional tree implementation that efficiently finds nearest neighbors in multi-dimensional space. In our analysis, this data structure accelerates the process of locating the closest satellite data points to each MVP measurement location, significantly improving computational performance when working with large satellite datasets.
scipy.spatial.ConvexHull: A computational geometry tool that calculates the smallest convex set containing a given set of points. We'll use this to delineate frontal zones in the Northwestern Mediterranean, creating polygon boundaries that can be overlaid on our maps to highlight oceanographically significant regions.
These spatial interpolation and analysis utilities bridge the gap between different sampling strategies, enabling direct comparison between satellite and in-situ measurements despite their inherently different spatial resolutions and sampling geometries.
## 1.5 Interpolation utilities
from scipy.interpolate import griddata
from scipy.spatial import cKDTree, ConvexHull
When processing multiple oceanographic datasets and running computationally intensive analyses, efficient file management and execution tracking become essential. This section imports utilities that improve code functionality and user experience:
re: The regular expression module from the Python standard library provides sophisticated pattern matching capabilities. In our workflow, regular expressions are particularly useful for:
For example, we'll use pattern matching to identify SWOT files corresponding to specific oceanographic survey dates, ensuring proper temporal alignment between satellite and in-situ observations.
tqdm: A fast, extensible progress bar library that provides visual feedback during long-running operations. When processing multiple MVP transects (referred to as PLs or "plongées" in the code), tqdm will display a progress bar showing:
This feedback is invaluable when running analyses that might take significant time, particularly when processing the full dataset of 20 transects (PL5 through PL24).
These utilities enhance both the functionality and usability of our analysis pipeline, making it easier to manage complex file structures and monitor the progress of computationally intensive operations.
## 1.6 Pattern matching and progress monitoring
import re
from tqdm import tqdm
Robust validation of satellite altimetry data against in-situ measurements requires rigorous statistical analysis to quantify agreement, identify biases, and assess uncertainties. To support these statistical evaluations, we import:
statsmodels.api: A comprehensive library providing classes and functions for estimating statistical models, conducting statistical tests, and exploring data. In our oceanographic validation context, statsmodels offers several key capabilities:
The comment "# diagnostic plots" indicates that we'll primarily use statsmodels for creating visual diagnostics to assess the quality of our data and statistical models.
Statistical analysis is particularly important in our SWOT validation workflow as it provides quantitative metrics of agreement between different observation systems, helping to characterize the accuracy and precision of satellite-derived oceanographic variables relative to direct in-situ measurements. These metrics will ultimately help determine whether SWOT data can reliably capture fine-scale oceanographic features in the Northwestern Mediterranean.
## 1.7 Statistical diagnostics
import statsmodels.api as sm # diagnostic plots
Consistent visual encoding is crucial for effective data presentation, especially when comparing multiple data sources across numerous figures. This section defines a standardized color scheme for the different velocity measurements:
swot_color_adt = "blue": Represents velocities derived from SWOT Absolute Dynamic Topography (ADT) gradients. These velocities are calculated by applying geostrophic balance equations to the sea surface height field measured by the satellite.
swot_color_total = "green": Denotes the total geostrophic velocities provided directly in the SWOT data products (ugos, vgos). These may incorporate additional processing steps beyond simple ADT gradient calculations.
swot_color_proj = "#006400" (dark green): Used for SWOT velocities that have been projected onto directions perpendicular to the MVP transect. This projection facilitates direct comparison with MVP-derived velocities, which are inherently perpendicular to the ship's track.
mvp_color = "red": Identifies velocities calculated from the Moving Vessel Profiler (MVP) data. These in-situ measurements serve as our reference dataset for validating the satellite observations.
This consistent color coding will be applied throughout all visualizations, making it easier to track different data sources across multiple plots and facilitating the interpretation of comparative analyses. The use of distinct, contrasting colors ensures that velocity vectors from different sources remain visually distinguishable even when overlaid on the same map or graph.
# 1.8 Colour palette for plots
swot_color_adt = "blue" # SWOT velocities derived from ADT
swot_color_total = "green" # SWOT total velocities
swot_color_proj = "#006400" # SWOT projected velocities
mvp_color = "red" # MVP velocities
This section contains various utility functions that form the backbone of our oceanographic data processing pipeline. These functions handle common operations such as loading satellite data, converting time formats, and manipulating vector fields.
The loadsshuv function provides a flexible interface for loading sea surface height (SSH) and velocity fields from NetCDF files, with special handling for SWOT data. This centralized loading function ensures consistent data processing across different satellite products.
loadsshuv(fname, rep, varn, **kwargs)
Parameters:
fname: Filename pattern to loadrep: Repository (directory) where the file is locatedvarn: Dictionary mapping standard variable names to dataset-specific names**kwargs: Additional options including 'type' (data source) and 'area' (geographic bounds)Key oceanographic variables extracted:
The function implements specific processing for SWOT data, including:
area parameter to focus on a specific regionto_masked_array()For non-SWOT data (e.g., DUACS), the function uses a more generic approach, extracting variables based on the names provided in the varn dictionary.
The result is a standardized dictionary containing all necessary variables for oceanographic analysis, regardless of the input data source. This abstraction layer simplifies downstream code by providing a consistent interface to different altimetry products.
### 2.1.1 Generic loader for SSH and velocity fields
def loadsshuv(fname, rep, varn, **kwargs):
"""
Load lon/lat, SSH, U, V and ADT fields from a NetCDF file,
with optional geographic sub-setting.
"""
if 'type' in kwargs:
if kwargs['type'] == 'swot':
filei2 = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei2)
if 'area' in kwargs:
area = kwargs['area']
selection = ((ds.longitude > area[0]) &
(ds.longitude < area[2]) &
(ds.latitude > area[1]) &
(ds.latitude < area[3]))
ds = ds.where(selection)
lons = ds.longitude.to_masked_array()
lats = ds.latitude.to_masked_array()
us = ds.ugos.to_masked_array()
vs = ds.vgos.to_masked_array()
ssha = ds.ssha_filtered.to_masked_array()
mss = ds.mss.to_masked_array()
mdt = ds.mdt.to_masked_array()
# Remove rows that contain only NaNs
mask_row = np.all(np.isnan(us), axis=1)
u = us[~mask_row, :]
v = vs[~mask_row, :]
lon = lons[~mask_row, :]
lat = lats[~mask_row, :]
ssha = ssha[~mask_row, :]
mss = mss[~mask_row, :]
mdt = mdt[~mask_row, :]
# Absolute Dynamic Topography (ADT)
adt = ssha + mdt
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
return {
'lon' : lon,
'lat' : lat,
'ssh' : ssha,
'u' : u,
'v' : v,
'ssha': ssha,
'adt' : adt
}
The jj2date function converts Julian day numbers into standard calendar dates with time components, a crucial operation when working with oceanographic datasets that use different time representations.
jj2date(data, origin)
Parameters:
data: Array of Julian day values (can be fractional)origin: Reference date corresponding to Julian day 0Julian days provide a continuous count of days and fractions of days since a reference epoch, making them convenient for calculating time intervals in scientific applications. However, human-readable calendar dates (YYYY-MM-DD hh:mm:ss) are more intuitive for data analysis and reporting.
The conversion process involves two main steps:
Integer portion conversion: The whole number part of each Julian day is added to the origin date using pandas' to_timedelta function to calculate the calendar date.
Fractional portion conversion: The decimal part of each Julian day is converted to hours, minutes, and seconds through the following calculations:
For example, if we have a Julian day value of 43.75 with origin '2000-01-01':
This function is particularly useful when processing MVP data that might store time information in Julian day format but needs to be aligned with SWOT satellite pass times, which are typically recorded in calendar dates.
### 2.1.2 Julian-day to calendar conversion
def jj2date(data, origin):
"""
Convert fractional Julian days into calendar dates (YYYY-MM-DD hh:mm:ss).
"""
data = np.asarray(data)
origin = pd.to_datetime(origin)
times = []
for d in data:
day = (origin + pd.to_timedelta(d, unit='D')).strftime('%Y-%m-%d')
h = int((d % 1) * 24)
m = int(((d % 1) * 24 % 1) * 60)
s = int((((d % 1) * 24 % 1) * 60 % 1) * 60)
times.append(f"{day} {h:02d}:{m:02d}:{s:02d}")
return np.array(times)
The extract_values function implements a recursive algorithm to extract individual values from deeply nested data structures, which is particularly useful when working with MATLAB files converted to Python.
extract_values(nested_array)
Parameter:
nested_array: A complex, potentially multi-level nested structure of arrays, lists, or tuplesMATLAB files (.mat) loaded via scipy.io.loadmat() often contain deeply nested arrays with non-intuitive structures. This complexity arises from the translation between MATLAB's matrix-oriented data storage and Python's object hierarchy. For example, a simple MATLAB scalar can become a nested NumPy array when loaded in Python.
The function handles three main cases:
NumPy arrays:
'O'), it likely contains other arrays or objects, so we recurse into it.item()Tuples:
Other values (including basic Python types):
The recursion strategy uses an inner function recurse() that modifies the values list in the outer scope, accumulating all primitive values encountered during traversal. This approach efficiently flattens structures of arbitrary depth and complexity.
In our workflow, this function is particularly valuable when extracting date information or metadata from MVP .mat files, where the nested structure might otherwise make direct access challenging. By flattening these complex structures, we can more easily obtain the values needed for subsequent processing steps.
### 2.1.3 Recursive flattening of nested structures
def extract_values(nested_array):
"""
Flatten arbitrarily nested arrays, tuples or lists into a single list.
"""
values = []
def recurse(arr):
for element in arr:
if isinstance(element, np.ndarray):
if element.dtype == 'O':
recurse(element)
else:
values.append(element.item())
elif isinstance(element, tuple):
for item in element:
recurse(item)
else:
values.append(element)
recurse(nested_array)
return values
The rotate_vector function performs a standard 2D rotation of velocity vectors, which is essential for aligning velocities measured in different reference frames.
rotate_vector(u, v, angle_deg)
Parameters:
u: East-west (zonal) component of the vectorv: North-south (meridional) component of the vectorangle_deg: Rotation angle in degrees (positive for counterclockwise rotation)Vector rotation is mathematically described by the transformation:
$$ \begin{bmatrix} u_{rot} \\ v_{rot} \end{bmatrix} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} $$Where $\theta$ is the rotation angle in radians.
This function is particularly useful when we need to:
In oceanographic applications, rotations are often needed when comparing velocity fields from different sources. For example, MVP-derived velocities are naturally oriented perpendicular to the ship's track, while SWOT velocities are provided in standard east-north coordinates. Rotating these vectors to a common reference frame facilitates direct comparison.
The function first converts the input angle from degrees to radians using NumPy's deg2rad function, then applies the standard rotation matrix operations to compute the rotated components. The result is returned as a tuple containing the rotated u and v components.
### 2.2.1 Planar rotation
def rotate_vector(u, v, angle_deg):
"""
Rotate the vector (u, v) by angle_deg degrees.
"""
angle_rad = np.deg2rad(angle_deg)
u_rot = u * np.cos(angle_rad) - v * np.sin(angle_rad)
v_rot = u * np.sin(angle_rad) + v * np.cos(angle_rad)
return u_rot, v_rot
The moving_average function implements a centered moving average filter that smooths time series or spatial data by averaging values within a sliding window.
moving_average(x, interval)
Parameters:
x: Input array to be smoothedinterval: Window size for averaging (must be odd to ensure symmetry)Moving averages are fundamental tools in oceanographic data processing for:
For a given point in the input array, the centered moving average computes the mean of all values within a window of size interval centered on that point. Mathematically, this is expressed as:
where $start = \frac{interval-1}{2}$ (integer division)
The implementation first calculates the starting offset (start) to ensure the window is properly centered. Then, for each valid position in the array (excluding points too close to the edges), it computes the mean of all values within the window.
Key characteristics of this implementation:
start positions of the edges) are excluded from the outputlen(x) - 2*start, which is shorter than the input arrayIn our oceanographic context, this function can be used to smooth MVP or satellite data spatially, reducing small-scale variations to focus on the larger oceanographic features of interest, such as fronts and eddies.
### 2.2.2 Moving average (centred window)
def moving_average(x, interval):
"""
Compute a moving average of x over a centred window of length = interval.
"""
start = (interval - 1) // 2
liste_moyennes = []
for i in range(start, len(x) - start):
liste_moyennes.append(np.mean(x[i - start:i + start + 1]))
return liste_moyennes
The project_vector function computes the vector projection of one 2D vector onto another, which is essential for comparing velocity components along specific directions.
project_vector(uA, vA, uB, vB)
Parameters:
uA, vA: Components of the vector to be projecteduB, vB: Components of the direction vector to project ontoVector projection is a fundamental operation in vector calculus and has important applications in oceanographic data analysis. When comparing velocities from different sources (e.g., SWOT vs. MVP), we often need to examine how much of one velocity field aligns with another.
The mathematical process involves three steps:
Normalization of the direction vector (uB, vB) to create a unit vector:
$$\hat{\mathbf{B}} = \frac{\mathbf{B}}{|\mathbf{B}|} = \frac{(u_B, v_B)}{\sqrt{u_B^2 + v_B^2}}$$
Scalar projection of vector A onto the unit vector B̂:
$$A_{proj} = \mathbf{A} \cdot \hat{\mathbf{B}} = u_A \cdot \hat{u}_B + v_A \cdot \hat{v}_B$$
Vector projection calculation:
$$\mathbf{A}_{proj} = A_{proj} \cdot \hat{\mathbf{B}} = A_{proj} \cdot (\hat{u}_B, \hat{v}_B)$$
The function includes a safety check for when vector B has zero magnitude, in which case it returns a zero vector to avoid division by zero.
In our oceanographic context, this function is particularly useful for:
This enables direct comparison between satellite-derived velocities and in-situ measurements, even when they're originally represented in different coordinate systems.
### 2.2.3 Projection of one vector onto another
def project_vector(uA, vA, uB, vB):
"""
Project vector (uA, vA) onto the direction of (uB, vB).
"""
normB = np.sqrt(uB**2 + vB**2)
if normB == 0:
return 0., 0.
uB_hat, vB_hat = uB / normB, vB / normB
scalar_proj = uA*uB_hat + vA*vB_hat
return scalar_proj * uB_hat, scalar_proj * vB_hat
The extract_key_swot function implements a pattern-matching approach to extract acquisition dates from SWOT satellite data filenames, which is essential for temporal alignment with MVP measurements.
extract_key_swot(filename)
Parameter:
filename: String containing the full SWOT data filenameSWOT filenames follow a standardized naming convention that includes a timestamp in the format YYYYMMDDTHHMMSS, where:
YYYYMMDD represents the date (year, month, day)T is a literal separatorHHMMSS represents the time (hours, minutes, seconds)The function uses a regular expression pattern r'\d{8}T\d{6}' to identify this timestamp within the filename:
\d{8} matches exactly 8 digits (the date portion)T matches the literal character 'T'\d{6} matches exactly 6 digits (the time portion)Once the timestamp is extracted, the function:
datetime.strptime with the format code '%Y%m%d'.date()If no matching timestamp is found in the filename, the function returns None.
This function serves as a key sorting criterion when organizing SWOT data files chronologically, which is necessary for matching satellite passes with the corresponding MVP transects. It ensures that we can efficiently identify which SWOT observations correspond to the same day as a particular MVP survey, enabling direct comparison between the two datasets.
### 2.2.4 Date extraction from SWOT filenames
def extract_key_swot(filename):
match = re.search(r'\d{8}T\d{6}', filename)
if match:
return datetime.strptime(match.group(0)[:8], '%Y%m%d').date()
return None
The load_SWOT function handles the crucial task of spatially co-locating SWOT satellite altimetry data with MVP in-situ measurement positions, which is essential for direct comparison between the two datasets.
load_SWOT(mvp, dayv, swotdir, savefig, **kwargs)
Parameters:
mvp: Dictionary containing MVP data, including longitude and latitude positionsdayv: Date string in the format "YYYY-MM-DD" for the desired SWOT passswotdir: Directory path where SWOT files are storedsavefig: Directory for saving figures (not used in this function but maintained for interface consistency)**kwargs: Additional parameters (not used in this function)The function performs several key operations:
Geographic domain determination: First, it defines a rectangular region around the MVP transect, with a 0.5° buffer in all directions:
domain = [min_longitude - 0.5, min_latitude - 0.5,
max_longitude + 0.5, max_latitude + 0.5]
SWOT file identification: It uses pattern matching to find the appropriate SWOT file for the specified date:
fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
dayv_dt.strftime('%Y%m%d') + '*_*.nc')
Data loading: The function calls loadsshuv to extract SSH and velocity fields from the identified file, filtering by the calculated geographic domain.
Spatial interpolation: Finally, it uses scipy.interpolate.griddata to interpolate the SWOT Absolute Dynamic Topography (ADT) field onto the exact MVP sampling locations using linear interpolation:
adtc = griddata(points, valuesadt, (mvp['lon'], mvp['lat']), method='linear')
This interpolation step is critical for direct point-by-point comparison between satellite and in-situ measurements. Since SWOT sampling points rarely coincide exactly with MVP measurement locations, interpolation provides values at the precise coordinates where MVP data was collected.
The function returns the interpolated ADT values along the MVP transect, allowing for subsequent comparative analysis between the satellite-derived sea surface height and the dynamic height calculated from MVP temperature and salinity profiles.
### 3.1.1 SWOT interpolation onto MVP positions
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
"""
Load SWOT data and interpolate ADT onto MVP sampling points.
"""
domain = [np.floor(np.min(mvp['lon'])) - 0.5,
np.floor(np.min(mvp['lat'])) - 0.5,
np.ceil(np.max(mvp['lon'])) + 0.5,
np.ceil(np.max(mvp['lat'])) + 0.5]
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
dayv_dt.strftime('%Y%m%d') + '*_*.nc')
if not fname:
print(f"No SWOT file found in load_SWOT for date {dayv}")
return None
swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
points = np.array((swot['lon'].flatten(), swot['lat'].flatten())).T
valuesadt = swot['adt'].flatten()
adtc = griddata(points, valuesadt,
(mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtc)
The load_DUACS function performs a similar co-location process as load_SWOT, but specifically for DUACS (Data Unification and Altimeter Combination System) altimetry products, which provide merged data from multiple satellite missions.
load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs)
Parameters:
mvp: Dictionary containing MVP data including longitude and latitudedayv: Date string in "YYYY-MM-DD" formatduacsdir: Directory containing DUACS data filessavefig: Directory for saving figures (not used but maintained for consistency)**kwargs: Additional parameters (not used)DUACS represents an important reference dataset in our validation framework because it:
The function differs from load_SWOT in several notable ways:
No explicit domain specification: Unlike load_SWOT, this function does not subset the data spatially. DUACS files typically cover predefined regions, so the entire file is loaded.
Different file naming convention: DUACS follows its own naming pattern:
'nrt_europe_allsat_phy_l4_' + date + '_*.nc'
where 'nrt' stands for Near-Real-Time and 'l4' indicates Level-4 processing (gridded, gap-free fields).
Different variable mapping: The variable names dictionary maps 'ssh' to 'sla' (Sea Level Anomaly) rather than 'ssha_filtered' as in SWOT.
Mesh grid creation: DUACS data typically comes on a regular latitude-longitude grid, requiring creation of a mesh grid before flattening for interpolation:
X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
The interpolation process is otherwise identical to that used for SWOT data, employing linear interpolation to estimate ADT values at MVP sampling locations.
By providing both SWOT and DUACS interpolated fields at MVP points, we can perform a three-way comparison that helps identify which differences might be specific to SWOT and which are common to satellite altimetry in general.
### 3.1.2 DUACS interpolation onto MVP positions
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
"""
Load DUACS data and interpolate ADT onto MVP sampling points.
"""
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
dayv_dt.strftime('%Y%m%d') + '_*.nc')
if not fname:
print(f"No DUACS file found in load_DUACS for date {dayv}")
return None
duacs = loadsshuv(fname[0], '', varn)
X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
points = np.array((X.flatten(), Y.flatten())).T
valuesadt = duacs['adt'].values.flatten()
adtd = griddata(points, valuesadt,
(mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtd)
This code block creates a comprehensive inventory of available SWOT data files and extracts their acquisition dates, which is essential for temporal matching with MVP surveys.
The process consists of three main steps:
File discovery: Using the glob module, all SWOT Level-3 Low Resolution (LR) Sea Surface Height (SSH) Expert files are located in the specified directory:
swot_files = glob.glob('/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/SWOT_L3_LR_SSH_Expert_*_v2.0.0.nc')
These files contain processed SWOT data with various corrections already applied, making them suitable for scientific analysis.
Chronological sorting: The files are sorted by their acquisition dates using the previously defined extract_key_swot function as the sorting key:
swot_files.sort(key=extract_key_swot)
This ensures that the file list is organized chronologically, which simplifies subsequent temporal matching operations.
Date extraction: For each file, the acquisition date is extracted from a specific position in the filename (the 8th element after splitting by underscores) and converted to a datetime object:
date_strs = os.path.splitext(filename)[0].split('_')[7:8]
for date_str in date_strs:
dates_swot.append(datetime.strptime(date_str[:8], '%Y%m%d').date())
The resulting dates_swot list contains the acquisition dates of all available SWOT passes, which will be used later to find the SWOT observations that temporally coincide with each MVP transect.
This temporal alignment is crucial for meaningful validation, as oceanographic conditions can change rapidly. Ideally, the satellite and in-situ measurements should be acquired as close in time as possible to ensure they're observing the same oceanic state.
## 3.2 SWOT file catalogue and acquisition dates
swot_files = glob.glob('/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
'SWOT_L3_LR_SSH_Expert_*_v2.0.0.nc')
swot_files.sort(key=extract_key_swot)
dates_swot = []
for files in swot_files:
filename = os.path.basename(files)
date_strs = os.path.splitext(filename)[0].split('_')[7:8]
for date_str in date_strs:
dates_swot.append(datetime.strptime(date_str[:8], '%Y%m%d').date())
This code block processes geographical coordinates to define the boundary of an oceanographic frontal zone, which will be visualized on maps to provide context for the MVP and SWOT measurements.
Ocean fronts are boundaries between different water masses, often characterized by strong horizontal gradients in properties like temperature, salinity, and density. These fronts can significantly influence ocean circulation and biological productivity, making them important features to identify in our analysis.
The process follows these steps:
Load frontal zone data: A CSV file containing coordinates of points identified as belonging to the frontal region is loaded using pandas:
df_f = pd.read_csv("/Volumes/Extreme SSD/MIO/tsg_crown_AFB.csv")
Filter for the specific region: Only points labeled as region 'F' (front) are selected:
region_f_df = df_f[df_f['Region'] == 'F']
Extract coordinate pairs: The longitude and latitude values are extracted as a NumPy array, with any rows containing NaN values removed:
coords_f = region_f_df[['Longitude', 'Latitude']].dropna().values
Compute the convex hull: If there are sufficient points (more than 2), a convex hull is calculated using SciPy's ConvexHull function:
if len(coords_f) > 2:
hull_f = ConvexHull(coords_f)
polygon_points_f = coords_f[hull_f.vertices]
else:
polygon_points_f = coords_f
The convex hull is the smallest convex polygon that encloses all the points, providing a simplified boundary of the frontal zone.
The resulting polygon_points_f array contains the ordered vertices of the polygon that defines the frontal boundary. This polygon will be plotted on maps in subsequent visualizations, helping to contextualize the MVP measurements and SWOT observations relative to this important oceanographic feature.
By identifying and visualizing the frontal zone, we can examine how well SWOT captures the dynamic height gradients and associated geostrophic velocities across this oceanographically significant feature.
## 3.3 Frontal region convex-hull computation
df_f = pd.read_csv("/Volumes/Extreme SSD/MIO/tsg_crown_AFB.csv")
region_f_df = df_f[df_f['Region'] == 'F']
coords_f = region_f_df[['Longitude', 'Latitude']].dropna().values
if len(coords_f) > 2:
hull_f = ConvexHull(coords_f)
polygon_points_f = coords_f[hull_f.vertices]
else:
polygon_points_f = coords_f
The plot_fronte function visualizes the previously computed frontal zone boundary on map figures, highlighting this important oceanographic feature for contextual analysis.
plot_fronte(m)
Parameter:
m: A Basemap object on which to draw the frontal boundaryThe function iterates through the vertices of the frontal polygon and connects consecutive points with line segments, creating a closed polygon that represents the boundary of the front. Several design choices enhance the visual representation:
Adaptive line style: The function intelligently selects the line style based on the orientation of each segment:
style = '--' if abs(lon2 - lon1) > abs(lat2 - lat1) else ':'
--) are used for predominantly east-west segments:) are used for predominantly north-south segmentsThis helps visually distinguish the orientation of different parts of the front.
Consistent formatting: All segments are drawn in purple with a linewidth of 2, making the front easily distinguishable from other map elements.
High z-order (15): The front is drawn above most other map elements to ensure visibility.
Legend entry: Only the first segment is labeled in the legend as "Front" to avoid multiple duplicate entries.
The function includes a safety check to ensure there are at least two points available for drawing; otherwise, it returns without action.
In oceanographic analysis, fronts are critical features that often represent boundaries between different water masses with distinct properties. They can drive vertical circulation, enhance biological productivity, and influence the propagation of waves and eddies. By clearly marking the frontal boundary on our maps, we can better interpret the patterns observed in both the MVP and SWOT data, particularly examining how well satellite altimetry captures the sea surface signatures of these dynamic features.
### 3.3.1 Frontal hull plotting
def plot_fronte(m):
"""
Draw the convex hull of the front in violet.
"""
if len(polygon_points_f) < 2:
return
for i in range(len(polygon_points_f)):
j = (i + 1) % len(polygon_points_f)
lon1, lat1 = polygon_points_f[i]
lon2, lat2 = polygon_points_f[j]
x1, y1 = m(lon1, lat1)
x2, y2 = m(lon2, lat2)
style = '--' if abs(lon2 - lon1) > abs(lat2 - lat1) else ':'
if i == 0:
m.plot([x1, x2], [y1, y2], color='purple',
linestyle=style, linewidth=2,
zorder=15, label='Front')
else:
m.plot([x1, x2], [y1, y2], color='purple',
linestyle=style, linewidth=2, zorder=15)
When comparing velocity measurements from different sources along a ship track, we need specialized functions to calculate components in specific directions. This section introduces functions that compute signed velocity components relative to MVP transects.
The compute_signed_speed_along_transect function calculates the component of velocity perpendicular to the local direction of a transect, with sign indicating direction.
compute_signed_speed_along_transect(u, v, lon, lat)
Parameters:
u, v: Zonal and meridional velocity componentslon, lat: Longitude and latitude coordinates defining the transectThis function is essential for comparing satellite and in-situ velocities since MVP naturally measures the velocity component perpendicular to the ship's track. For a meaningful comparison, we need to project SWOT velocities in the same direction.
For each point along the transect, the algorithm follows these steps:
Determine local transect direction: For each point i, calculate the vector (dx, dy) representing the local direction of the transect:
Compute normal vector: The normal vector is perpendicular to the transect direction:
nx, ny = dy, -dx # 90-degree rotation of (dx, dy)
Normalize to unit vector:
nx, ny = nx / norm_n, ny / norm_n # where norm_n = √(nx² + ny²)
Calculate projection: The scalar (dot) product gives the component along the normal:
w_signed[i] = u[i]*nx + v[i]*ny
This results in a signed velocity that is:
The function includes safety checks for edge cases:
This calculation enables direct comparison between MVP-measured velocities and satellite-derived velocities, despite their different native reference frames and measurement principles.
## 4.1 Signed component perpendicular to transect
def compute_signed_speed_along_transect(u, v, lon, lat):
"""
Compute the signed scalar component along the direction perpendicular
to the transect defined by (lon, lat).
"""
N_val = len(u)
w_signed = np.zeros(N_val)
for i in range(N_val):
if N_val == 1:
w_signed[i] = 0.0
continue
if i == 0:
dx = lon[i+1] - lon[i]
dy = lat[i+1] - lat[i]
elif i == N_val-1:
dx = lon[i] - lon[i-1]
dy = lat[i] - lat[i-1]
else:
dx = lon[i+1] - lon[i-1]
dy = lat[i+1] - lat[i-1]
nx, ny = dy, -dx
norm_n = np.sqrt(nx**2 + ny**2)
if norm_n < 1e-10:
w_signed[i] = 0.0
continue
nx, ny = nx / norm_n, ny / norm_n
w_signed[i] = (u[i]*nx + v[i]*ny)
return w_signed
In oceanographic data analysis, understanding the direction of velocity vectors is crucial for interpreting ocean currents. The compute_signed_speed_tot() function calculates the total speed of a velocity vector while preserving its directional information relative to a transect line.
The function takes four key inputs: zonal velocity component u, meridional velocity component v, longitude coordinates lon, and latitude coordinates lat.
The core algorithm works through several computational steps:
Initialize an output array w_signed_tot filled with zeros, handling special cases like single-point input.
Compute local transect direction vectors by calculating coordinate differences between adjacent points. This involves determining the local geometry of the transect line.
Derive a normal vector perpendicular to the transect and normalize it.
Mathematically, the vector sign is determined by the dot product projection:
$$\text{sign} = \begin{cases} 1 & \text{if } \vec{v} \cdot \vec{n} \geq 0 \\ -1 & \text{if } \vec{v} \cdot \vec{n} < 0 \end{cases}$$Where:
The final signed speed is computed as: $\text{signed_speed} = \text{sign} \times |\vec{v}|$
This approach ensures that the velocity vector's magnitude and direction are preserved relative to the local transect geometry, providing a nuanced representation of ocean current dynamics.
## 4.2 Signed full-speed magnitude
def compute_signed_speed_tot(u, v, lon, lat):
"""
Return signed total speed: |(u, v)| with sign determined by the
projection onto (nx, ny).
"""
N_val = len(u)
w_signed_tot = np.zeros(N_val)
for i in range(N_val):
if N_val == 1:
w_signed_tot[i] = 0.0
continue
if i == 0:
dx = lon[i+1] - lon[i]
dy = lat[i+1] - lat[i]
elif i == N_val-1:
dx = lon[i] - lon[i-1]
dy = lat[i] - lat[i-1]
else:
dx = lon[i+1] - lon[i-1]
dy = lat[i+1] - lat[i-1]
nx, ny = dy, -dx
norm_n = np.sqrt(nx**2 + ny**2)
speed = np.sqrt(u[i]**2 + v[i]**2)
if norm_n < 1e-10:
w_signed_tot[i] = 0.0
continue
nx, ny = nx / norm_n, ny / norm_n
sign_ = 1.0 if (u[i]*nx + v[i]*ny) >= 0 else -1.0
w_signed_tot[i] = sign_ * speed
return w_signed_tot
The MVP (Moving Vessel Profiler) Transect Processing Loop represents the core analytical workflow of this oceanographic study. This comprehensive section transforms raw in situ measurements into refined scientific insights by systematically processing each marine transect collected during the BioSWOT-Med campaign.
The processing loop implements a multi-stage data transformation strategy that:
Each iteration of the loop will:
The methodological approach allows for a rigorous, systematic investigation of submesoscale ocean dynamics in the Northwestern Mediterranean, bridging in situ measurements with satellite altimetry through sophisticated computational techniques.
The preliminary setup initializes the processing loop for Multiple Vessel Profiler (MVP) transects in the BioSWOT-Med campaign. The code prepares to systematically analyze a range of profile lines (PLs) from 5 to 24.
Key elements of this setup:
np.arange(5, 25) creates a NumPy array of integers from 5 to 24 (note that arange excludes the upper bound)PLs stores the list of profile line numbers to be processedtqdm() is a progress bar utility that provides visual feedback during the iterative processing of multiple transectsThe subsequent for loop will iterate through each profile line, applying a comprehensive data processing workflow for each transect. This approach allows for:
tqdm progress barThe use of an indexed loop ensures that each transect (PL) is processed individually, enabling detailed examination of local oceanographic characteristics in the Northwestern Mediterranean region.
## 5.0 Preliminary set-up
PLs = np.arange(5, 25)
for PL in tqdm(PLs):
This code segment handles the crucial task of extracting and converting date information from MATLAB (.mat) files for each MVP transect. The process involves several key transformations:
File Loading:
scipy.io.loadmat() to read MATLAB metadata filesDate Extraction:
extract_values() function recursively flattens nested date arraysDate Conversion: The date conversion is a complex process that transforms MATLAB's serial date representation to Python datetime:
datetime.fromordinal() to convert the integer part of the datetimedelta()Mathematical representation of the date conversion: $\text{Python Date} = \text{Ordinal Date} + \text{Fractional Days} - 366$
This meticulous date processing ensures accurate temporal mapping of MVP transect measurements, which is critical for aligning in situ data with satellite observations in the BioSWOT-Med campaign.
## 5.1.1 Read .mat metadata and convert dates
mat_file = sio.loadmat(
'/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
f'MVP/L2/PL{PL}.mat'
)
date_extract = extract_values(mat_file['metadata']['date'])
date_pd = pd.DataFrame(date_extract)
datejj = date_pd[0]
dates_mvp = [
datetime.fromordinal(int(d[0][0])) +
timedelta(days=d[0][0] % 1) -
timedelta(days=366)
for d in datejj
]
This code segment performs critical oceanographic calculations to derive dynamic height, transforming raw measurements into scientifically meaningful data:
Data Loading:
Thermodynamic Conversions:
Dynamic Height Computation:
geo_strf_dyn_height() to calculate geostrophic stream functionAnomaly Calculation:
Key mathematical transformations include:
This processing converts raw measurements into a standardized format for further oceanographic analysis.
## 5.1.2 Oceanographic variables and centred dynamic height anomaly
mat = sio.loadmat(
'/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
f'MVP/L2_python/PL{PL}_python.mat'
)
lon_mvp, lat_mvp = mat['lon'], mat['lat']
p_mvp, t_mvp = mat['pres'], mat['temp']
SP_mvp = mat['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81 # gravitational acceleration (m s⁻²)
dh = []
for i in range(SA.shape[0]):
dh_i = (
gsw.geo_strf_dyn_height(
SA[i][~np.isnan(SA[i])],
CT[i][~np.isnan(SA[i])],
p_mvp[i][~np.isnan(SA[i])],
p_ref=np.max(p_mvp[i][~np.isnan(SA[i])])
)[0] / g_
)
dh.append(dh_i)
dhmean = np.mean(dh)
dhc = dh - dhmean # centred anomaly
df_mvp_raw = pd.DataFrame({
'dh' : dh,
'dhc': dhc,
'lon': lon_mvp[0],
'lat': lat_mvp[0]
})
This code segment computes the cumulative distance along the MVP transect using geodesic distance calculations. The process involves:
Distance Initialization:
Geodesic Distance Computation:
geopy.distance.distance() to calculate great-circle distancesKey characteristics of the distance calculation:
Mathematically, the cumulative distance is computed as:
$$\text{dist}[i] = \text{dist}[i-1] + \text{geodesic\_distance}(\text{point}_{i-1}, \text{point}_i)$$The resulting dist column in the DataFrame represents the progressive distance along the ship's track, critical for subsequent spatial interpolation and analysis of oceanographic data.
## 5.2.1 Cumulative distance along transect
dist_along_mvp = np.zeros(len(df_mvp_raw))
for i in range(1, len(df_mvp_raw)):
dist_along_mvp[i] = (
dist_along_mvp[i-1] +
geopy.distance.distance(
(df_mvp_raw['lat'].values[i-1],
df_mvp_raw['lon'].values[i-1]),
(df_mvp_raw['lat'].values[i],
df_mvp_raw['lon'].values[i])
).km
)
df_mvp_raw['dist'] = dist_along_mvp
This code segment performs two critical data preprocessing steps: spatial interpolation and smoothing of oceanographic measurements.
Interpolation Parameters:
step_km)window_size_km)Linear Interpolation:
interp() to map original measurements onto regular gridSmoothing Technique:
rolling() methodMathematical representations:
The resulting DataFrame provides a consistently gridded and smoothed dataset, prepared for further oceanographic analysis.
## 5.2.2 Interpolation to 1-km grid and 11-km smoothing
step_km = 1
window_size_km = 11
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
window_size = int(window_size_km / step_km)
if window_size % 2 == 0:
window_size += 1 # enforce odd window length
dh_reg_smooth = (
pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.values
)
dhc_reg_smooth = (
pd.Series(dhc_reg)
.rolling(window=window_size, center=True)
.mean()
.values
)
df_mvp = pd.DataFrame({
'dist' : dist_reg,
'lon' : lon_reg,
'lat' : lat_reg,
'dh_raw' : dh_reg,
'dhc_raw' : dhc_reg,
'dh' : dh_reg_smooth,
'dhc' : dhc_reg_smooth
}).dropna(subset=['dh', 'dhc']).copy()
This code segment calculates geostrophic velocity vectors from the Moving Vessel Profiler (MVP) dynamic height measurements, transforming vertical ocean structure data into horizontal current fields.
The geostrophic velocity calculation involves several key steps:
Initial Velocity Computation:
geostrophic_velocity() functionVelocity Vector Transformation:
arctan2()Key mathematical transformations include:
Where:
The method handles edge cases by using different derivative approximations for first, last, and intermediate points, ensuring a robust velocity field representation across the entire transect.
The result is a DataFrame enriched with zonal and meridional velocity components, ready for further oceanographic analysis.
# 5.3 GEOSTROPHIC VELOCITY FROM MVP
w_mvp_smooth = gsw.geostrophy.geostrophic_velocity(
df_mvp['dh'].values,
df_mvp['lon'].values,
df_mvp['lat'].values
)
N_mvp = len(w_mvp_smooth[0])
df_mvp = df_mvp.iloc[:N_mvp].copy()
u_mvp = np.zeros(N_mvp)
v_mvp = np.zeros(N_mvp)
for i in range(N_mvp):
if N_mvp > 1:
if i == 0:
dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i]
dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i]
elif i == N_mvp-1:
dx = w_mvp_smooth[1][i] - w_mvp_smooth[1][i-1]
dy = w_mvp_smooth[2][i] - w_mvp_smooth[2][i-1]
else:
dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i-1]
dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i-1]
angle = np.arctan2(dx, dy)
u_mvp[i] = - w_mvp_smooth[0][i] * g_ * np.cos(angle)
v_mvp[i] = w_mvp_smooth[0][i] * g_ * np.sin(angle)
df_mvp['u_mvp'] = u_mvp
df_mvp['v_mvp'] = v_mvp
This code segment performs a critical task of matching MVP transect dates with corresponding SWOT satellite observations, ensuring precise temporal alignment for subsequent analysis.
Date Matching:
Domain Definition:
SWOT File Location:
glob to search for SWOT NetCDF filesData Extraction:
loadsshuv() functionADT Anomaly Computation:
The method ensures precise spatial and temporal synchronization between in situ MVP measurements and satellite observations, a crucial step in cross-validation of oceanographic data.
## 5.4.1 Identify corresponding SWOT file
dayv = None
for dsw in dates_swot:
if dsw == dates_mvp[0].date():
dayv = dsw.strftime('%Y-%m-%d')
break
if not dayv:
print(f"No matching (SWOT) date for PL{PL}")
continue
domain = [3, 38.5, 6.5, 44]
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn_swot = {
'longitude': 'longitude',
'latitude' : 'latitude',
'u' : 'ugos',
'v' : 'vgos',
'adt' : 'adt'
}
fname = glob.glob(
'/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
f'SWOT_L3_LR_SSH_Expert_*{dayv_dt.strftime("%Y%m%d")}*_*.nc'
)
if not fname:
print(f"No SWOT file found for date: {dayv} - PL{PL}")
continue
swot = loadsshuv(fname[0], '', varn_swot, type='swot', area=domain)
adt_flat_swot = swot['adt'].flatten()
lon_flat_swot = swot['lon'].flatten()
lat_flat_swot = swot['lat'].flatten()
u_swot_tot = swot['u'].flatten()
v_swot_tot = swot['v'].flatten()
adtmean_swot = np.nanmean(adt_flat_swot)
adtc_swot = adt_flat_swot - adtmean_swot
This code segment performs a spatial interpolation technique to map SWOT satellite data onto the MVP transect trajectory using a computationally efficient nearest-neighbor search.
Data Filtering:
Spatial Nearest-Neighbor Search:
cKDTree from SciPy for efficient spatial queryingCoordinate Matching:
Data Extraction:
The technique ensures precise spatial alignment between in situ and satellite-derived measurements, critical for robust oceanographic cross-validation. By using nearest-neighbor interpolation, the method maintains the original resolution of both datasets while providing a direct point-to-point comparison.
# 5.4.2 Nearest-neighbour search to MVP transect
valid_mask = ~np.isnan(lon_flat_swot) & ~np.isnan(lat_flat_swot)
lon_valid = lon_flat_swot[valid_mask]
lat_valid = lat_flat_swot[valid_mask]
u_valid = u_swot_tot[valid_mask]
v_valid = v_swot_tot[valid_mask]
adt_valid = adt_flat_swot[valid_mask]
tree = cKDTree(np.c_[lon_valid, lat_valid])
_, indices = tree.query(
np.c_[df_mvp['lon'].values, df_mvp['lat'].values], k=1
)
u_swot = u_valid[indices]
v_swot = v_valid[indices]
adt_on_transect = adt_valid[indices]
This code segment performs a scaling transformation on the velocity components, multiplying the MVP-derived velocities by a large coefficient (100,000).
The purpose of scaling is to:
By multiplying velocity components by coeff_scaled:
Mathematically, the transformation is simple:
This scaling approach is common in oceanographic visualization, helping researchers better perceive subtle ocean current dynamics that might be imperceptible at their original, typically small magnitudes.
# 5.5 SCALING
coeff_scaled = 100000
u_mvp_scaled = df_mvp['u_mvp'].values * coeff_scaled
v_mvp_scaled = df_mvp['v_mvp'].values * coeff_scaled
This code segment computes geostrophic velocities by interpolating and processing Absolute Dynamic Topography (ADT) from SWOT satellite data.
Spatial Interpolation:
griddata() for linear interpolation of ADTGeostrophic Velocity Computation:
Velocity Vector Transformation:
Key mathematical transformations include:
Where:
The final step scales velocity components by 100,000 to enhance visualization and numerical comparability, similar to the MVP velocity scaling performed earlier.
## 5.5.1 Geostrophic velocity from interpolated SWOT ADT
adt_swot_interp = griddata(
(lon_valid, lat_valid),
adt_valid,
(df_mvp['lon'].values, df_mvp['lat'].values),
method='linear'
)
w_swot_calc_raw = gsw.geostrophy.geostrophic_velocity(
adt_swot_interp,
df_mvp['lon'].values,
df_mvp['lat'].values
)
N2 = min(len(w_swot_calc_raw[0]),
len(w_swot_calc_raw[1]),
len(w_swot_calc_raw[2]))
u_swot_calc = np.zeros(N2)
v_swot_calc = np.zeros(N2)
for i in range(N2):
if N2 == 1:
dx2 = dy2 = 0
elif i == 0:
dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i]
dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i]
elif i == N2-1:
dx2 = w_swot_calc_raw[1][i] - w_swot_calc_raw[1][i-1]
dy2 = w_swot_calc_raw[2][i] - w_swot_calc_raw[2][i-1]
else:
dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i-1]
dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i-1]
angle2 = np.arctan2(dx2, dy2)
u_swot_calc[i] = - w_swot_calc_raw[0][i] * g_ * np.cos(angle2)
v_swot_calc[i] = w_swot_calc_raw[0][i] * g_ * np.sin(angle2)
scale_calc = 100000
u_swot_calc_scaled = u_swot_calc * scale_calc
v_swot_calc_scaled = v_swot_calc * scale_calc
This section generates comprehensive visualizations to compare SWOT satellite and MVP in-situ measurements, allowing for qualitative assessment of their agreement and highlighting oceanographic features.
This visualization creates a map comparing geostrophic velocities derived from SWOT Absolute Dynamic Topography (ADT) against those calculated from MVP measurements. This comparison is fundamental to validating the satellite data's ability to capture ocean currents.
The plot construction involves several key steps:
Map initialization with Basemap:
projection='merc')resolution='i') for an appropriate level of coastline detailBackground sea surface height field:
Spectral_r colormap represents ADT anomalies from -10 cm to +10 cmVelocity vector overlay:
Contextual elements:
plot_fronte functionThis visualization allows for immediate qualitative assessment of:
The map provides crucial insight into how well SWOT captures the geostrophic flow field as compared to direct in-situ measurements, particularly around the frontal zone where strong horizontal gradients are expected.
### 6.1.1 ADT-based velocity vs MVP
min_len_calc = min(N2, len(df_mvp))
plt.figure(figsize=(12, 8))
m = Basemap(
projection='merc',
llcrnrlat=df_mvp['lat'].min() - 0.4,
urcrnrlat=df_mvp['lat'].max() + 0.4,
llcrnrlon=df_mvp['lon'].min() - 0.4,
urcrnrlon=df_mvp['lon'].max() + 0.4,
resolution='i'
)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.fillcontinents(color='beige', lake_color='lightblue')
m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])
x_swot_map, y_swot_map = m(lon_valid, lat_valid)
sc = m.scatter(
x_swot_map, y_swot_map,
c=adt_valid - np.nanmean(adt_valid),
cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
)
x_pl_map_1, y_pl_map_1 = m(
df_mvp['lon'].values[:min_len_calc],
df_mvp['lat'].values[:min_len_calc]
)
m.scatter(x_pl_map_1, y_pl_map_1, color='black', label=f'PL{PL}', zorder=5, s=10)
q1 = m.quiver(
x_pl_map_1, y_pl_map_1,
u_swot_calc_scaled[:min_len_calc],
v_swot_calc_scaled[:min_len_calc],
angles='xy', scale_units='xy', scale=1,
color=swot_color_adt, zorder=5,
label='SWOT velocity (from ADT)'
)
m.quiver(
x_pl_map_1, y_pl_map_1,
u_mvp_scaled[:min_len_calc],
v_mvp_scaled[:min_len_calc],
angles='xy', scale_units='xy', scale=1,
color=mvp_color, zorder=10, label='MVP velocity'
)
plt.quiverkey(q1, 0.80, 0.1, 0.10 * scale_calc, '0.10 m/s',
labelpos='E', coordinates='axes', color='black')
plt.colorbar(sc, label='SWOT adt (m) [unsmoothed]')
plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (1) ADT-based Velocity vs MVP')
plot_fronte(m)
plt.legend(loc='upper right')
plt.show()
This visualization compares the full SWOT velocity field (provided directly in the SWOT data product) with the MVP-derived velocities. Unlike the previous figure that used velocities calculated from ADT gradients, this plot utilizes the pre-computed geostrophic velocity components (ugos, vgos) from the SWOT database.
The map structure follows the same general layout as the previous figure, with the following key differences:
Vector data source:
Vector scaling:
u_swot_scaled_tot = u_swot[:min_len_uv] * coeff_scaled
v_swot_scaled_tot = v_swot[:min_len_uv] * coeff_scaled
The scaling coefficient (coeff_scaled, set to 100,000) ensures that the vectors are visually interpretable despite the small magnitude of ocean currents (typically centimeters to tens of centimeters per second).
Coordinate system: Unlike MVP velocities which are naturally perpendicular to the ship track, these SWOT velocities are provided in standard zonal and meridional components. This means they represent the full 2D velocity field rather than just the cross-track component.
This comparison is important because it reveals differences between two approaches to deriving velocity from SWOT data:
By comparing both SWOT velocity types against MVP measurements, we can determine which better represents the actual ocean currents. Potential differences might arise from:
The color-coded background continues to show the ADT anomaly field, providing context for the velocity vectors and highlighting the relationship between sea surface height patterns and the associated geostrophic flow.
### 6.1.2 Full SWOT velocity vs MVP
min_len_uv = min(len(u_swot), len(df_mvp))
plt.figure(figsize=(12, 8))
m = Basemap(
projection='merc',
llcrnrlat=df_mvp['lat'].min() - 0.4,
urcrnrlat=df_mvp['lat'].max() + 0.4,
llcrnrlon=df_mvp['lon'].min() - 0.4,
urcrnrlon=df_mvp['lon'].max() + 0.4,
resolution='i'
)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.fillcontinents(color='beige', lake_color='lightblue')
m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])
x_pl, y_pl = m(
df_mvp['lon'].values[:min_len_uv],
df_mvp['lat'].values[:min_len_uv]
)
x_swot, y_swot = m(lon_valid, lat_valid)
sc = m.scatter(
x_swot, y_swot,
c=adt_valid - np.nanmean(adt_valid),
cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
)
m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)
u_swot_scaled_tot = u_swot[:min_len_uv] * coeff_scaled
v_swot_scaled_tot = v_swot[:min_len_uv] * coeff_scaled
q2 = m.quiver(
x_pl, y_pl,
u_swot_scaled_tot, v_swot_scaled_tot,
angles='xy', scale_units='xy', scale=1,
color=swot_color_total, zorder=5,
label='SWOT velocity (TOTAL)'
)
m.quiver(
x_pl, y_pl,
u_mvp_scaled[:min_len_uv], v_mvp_scaled[:min_len_uv],
angles='xy', scale_units='xy', scale=1,
color=mvp_color, zorder=10, label='MVP velocity'
)
plt.quiverkey(q2, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
labelpos='E', coordinates='axes', color='black')
plt.colorbar(sc, label='SWOT adt (m)')
plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (2) FULL Velocity')
plot_fronte(m)
plt.legend(loc='upper right')
plt.show()
This visualization addresses a fundamental challenge in comparing SWOT and MVP velocity measurements: different reference frames. While SWOT provides velocities in standard zonal/meridional components, MVP naturally measures flow perpendicular to the ship track. This figure projects the SWOT velocities onto the direction perpendicular to the MVP transect for direct comparison.
The projection process involves two main steps:
Calculate scalar projection: The function compute_signed_speed_along_transect determines the signed magnitude of SWOT velocity perpendicular to the transect:
w_swot_proj_sign_map = compute_signed_speed_along_transect(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
This yields the speed component in the direction normal to the transect, with sign indicating direction.
Convert to vector representation: The scalar projections are transformed back into vector components, but now aligned with the local normal direction:
nx, ny = dy, -dx # Normal vector to transect (perpendicular)
nx, ny = nx / norm_n, ny / norm_n # Normalized
u_swot_proj_map[i] = scalar_proj * nx
v_swot_proj_map[i] = scalar_proj * ny
This approach ensures that:
The figure uses the same map layout and ADT background as previous visualizations, but now with the projected SWOT velocities represented by dark green arrows. This projection method offers the most direct comparison between satellite and in-situ measurements by aligning their reference frames.
Key oceanographic insights from this figure include:
The visualization provides the most rigorous comparison of SWOT and MVP velocities by ensuring they represent the same physical quantity: the geostrophic flow perpendicular to the transect direction.
### 6.1.3 Projected SWOT velocity vs MVP
w_swot_proj_sign_map = compute_signed_speed_along_transect(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
u_swot_proj_map = np.zeros(len(df_mvp))
v_swot_proj_map = np.zeros(len(df_mvp))
for i in range(len(df_mvp)):
if len(df_mvp) == 1:
dx = dy = 0
elif i == 0:
dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i]
dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i]
elif i == len(df_mvp) - 1:
dx = df_mvp['lon'].values[i] - df_mvp['lon'].values[i-1]
dy = df_mvp['lat'].values[i] - df_mvp['lat'].values[i-1]
else:
dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i-1]
dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i-1]
nx, ny = dy, -dx
norm_n = np.sqrt(nx**2 + ny**2)
if norm_n < 1e-10:
u_swot_proj_map[i] = 0.
v_swot_proj_map[i] = 0.
else:
nx, ny = nx / norm_n, ny / norm_n
scalar_proj = w_swot_proj_sign_map[i]
u_swot_proj_map[i] = scalar_proj * nx
v_swot_proj_map[i] = scalar_proj * ny
plt.figure(figsize=(12, 8))
m = Basemap(
projection='merc',
llcrnrlat=df_mvp['lat'].min() - 0.4,
urcrnrlat=df_mvp['lat'].max() + 0.4,
llcrnrlon=df_mvp['lon'].min() - 0.4,
urcrnrlon=df_mvp['lon'].max() + 0.4,
resolution='i'
)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.fillcontinents(color='beige', lake_color='lightblue')
m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])
x_swot, y_swot = m(lon_valid, lat_valid)
sc = m.scatter(
x_swot, y_swot,
c=adt_valid - np.nanmean(adt_valid),
cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
)
x_pl, y_pl = m(df_mvp['lon'].values, df_mvp['lat'].values)
m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)
u_swot_proj_scaled_map = u_swot_proj_map * coeff_scaled
v_swot_proj_scaled_map = v_swot_proj_map * coeff_scaled
q3 = m.quiver(
x_pl, y_pl,
u_swot_proj_scaled_map, v_swot_proj_scaled_map,
angles='xy', scale_units='xy', scale=1,
color=swot_color_proj, zorder=5,
label='SWOT velocity (Projected)'
)
m.quiver(
x_pl, y_pl,
u_mvp_scaled, v_mvp_scaled,
angles='xy', scale_units='xy', scale=1,
color=mvp_color, zorder=10, label='MVP velocity'
)
plt.quiverkey(q3, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
labelpos='E', coordinates='axes', color='black')
plt.colorbar(sc, label='SWOT adt (m)')
plt.title(
f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (3) PROJECTED Velocity (New Method)'
)
plot_fronte(m)
plt.legend(loc='upper right')
plt.show()
While map-based visualizations provide spatial context, one-dimensional profile plots enable quantitative comparison of velocity magnitudes along the transect. This figure displays signed velocity components from multiple sources plotted against distance along the MVP transect.
The code first calculates signed velocity components from five different sources:
w_mvp_sign): The reference in-situ measurement, naturally perpendicular to the transectw_swot_tot_sign): Full magnitude of SWOT velocity with sign based on cross-track directionw_swot_proj_sign): Only the component perpendicular to the transectw_swot_calc_sign): Velocity calculated directly from SWOT ADT gradientsw_duacs_sign_full): Velocity from the multi-mission merged altimetry productFor DUACS velocity calculation, the code:
load_DUACS functiongeostrophic_velocity functionThe negative signs in the calculations (e.g., w_swot_tot_sign = -compute_signed_speed_tot(...)) ensure consistent orientation of the velocities, with positive values representing flow in the same direction.
The resulting line plot shows:
This visualization reveals:
Most importantly, this plot quantifies discrepancies between satellite and in-situ measurements in a way that the map visualizations cannot, enabling statistical analysis of the differences. Sharp transitions in the profiles often indicate crossing oceanographic fronts or the edges of eddies, with the agreement between profiles at these features being particularly important for validating SWOT's ability to resolve fine-scale ocean dynamics.
### 6.2.1 Signed velocity profiles
w_swot_tot_sign = - compute_signed_speed_tot(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
w_swot_proj_sign = - compute_signed_speed_along_transect(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
w_mvp_sign = - compute_signed_speed_along_transect(
df_mvp['u_mvp'].values, df_mvp['v_mvp'].values,
df_mvp['lon'].values, df_mvp['lat'].values
)
w_swot_calc_sign = - compute_signed_speed_along_transect(
u_swot_calc, v_swot_calc,
df_mvp['lon'].values[:N2], df_mvp['lat'].values[:N2]
)
duacsdir = '/Volumes/Extreme SSD/MIO/DUACS/'
savefig = ''
# DUACS reference velocity
adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
if adt_duacs is not None:
w_duacs_raw = gsw.geostrophy.geostrophic_velocity(
adt_duacs, df_mvp['lon'].values, df_mvp['lat'].values
)
N_duacs = len(w_duacs_raw[0])
u_duacs = np.zeros(N_duacs)
v_duacs = np.zeros(N_duacs)
for i in range(N_duacs):
if N_duacs == 1:
dx_d = dy_d = 0
elif i == 0:
dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i]
dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i]
elif i == N_duacs - 1:
dx_d = w_duacs_raw[1][i] - w_duacs_raw[1][i-1]
dy_d = w_duacs_raw[2][i] - w_duacs_raw[2][i-1]
else:
dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i-1]
dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i-1]
angle_d = np.arctan2(dx_d, dy_d)
u_duacs[i] = - w_duacs_raw[0][i] * g_ * np.cos(angle_d)
v_duacs[i] = w_duacs_raw[0][i] * g_ * np.sin(angle_d)
w_duacs_sign_full = - compute_signed_speed_along_transect(
u_duacs, v_duacs,
df_mvp['lon'].values, df_mvp['lat'].values
)
else:
w_duacs_sign_full = np.full(len(df_mvp), np.nan)
min_len_4 = min(
len(w_mvp_sign), len(w_swot_tot_sign), len(w_swot_proj_sign),
len(w_swot_calc_sign), len(w_duacs_sign_full)
)
x_4 = df_mvp['dist'].values[:min_len_4]
w_mvp_4 = w_mvp_sign[:min_len_4]
w_swot_tot_4 = w_swot_tot_sign[:min_len_4]
w_swot_proj_4 = w_swot_proj_sign[:min_len_4]
w_swot_calc_4 = w_swot_calc_sign[:min_len_4]
w_duacs_4 = w_duacs_sign_full[:min_len_4]
duacs_color = "magenta"
plt.figure()
plt.plot(x_4, w_mvp_4, 'o-', color=mvp_color, label='MVP (signed)')
plt.plot(x_4, w_swot_tot_4, 's-', color=swot_color_total, label='SWOT TOTAL (signed)')
plt.plot(x_4, w_swot_proj_4, '^-', color=swot_color_proj, label='SWOT Projected (signed)')
plt.plot(x_4, w_swot_calc_4, 'd-', color=swot_color_adt, label='SWOT from ADT (signed)')
plt.plot(x_4, w_duacs_4, 'p-', color=duacs_color, label='DUACS (signed)')
plt.grid(True)
plt.legend()
plt.xlabel('Distance along transect (km)')
plt.ylabel('Velocity (m/s) [signed]')
plt.title(
f'PL{PL} - (4) Signed Velocity Comparison '
'(MVP, TOTAL, Projected, ADT-based, DUACS)'
)
plt.show()
This section prepares additional datasets for detailed diagnostic visualizations that directly compare sea surface height measurements from different sources. These comparisons are essential for understanding the agreement between satellite altimetry products and in-situ observations.
The code first loads interpolated ADT data from both SWOT and DUACS at MVP sampling points:
adt_swot_extra = load_SWOT(df_mvp, dayv, swotdir, savefig)
adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
For each dataset, two versions are prepared:
adt_swot_extra, adt_duacs): The absolute dynamic topography values directly from the altimetry productsadtc_swot_extra, adtc_duacs): Anomalies calculated by subtracting the mean value from each dataset:adtmean_swot_extra = np.nanmean(adt_swot_extra)
adtc_swot_extra = adt_swot_extra - adtmean_swot_extra
This centering operation is important because:
The final step ensures all arrays are trimmed to the same length by limiting them to the length of the MVP dataframe:
L_full = len(df_mvp)
adt_swot_extra = adt_swot_extra[:L_full]
adtc_swot_extra = adtc_swot_extra[:L_full]
adt_duacs = adt_duacs[:L_full]
adtc_duacs = adtc_duacs[:L_full]
These prepared datasets will be used in the subsequent visualization plots to directly compare:
Such comparisons are vital for validating satellite measurements against in-situ data and understanding the strengths and limitations of different altimetry products.
## 6.3 Extra diagnostic plots
swotdir = '/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
savefig = '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/figures'
adt_swot_extra = load_SWOT(df_mvp, dayv, swotdir, savefig)
if adt_swot_extra is None:
continue
adtmean_swot_extra = np.nanmean(adt_swot_extra)
adtc_swot_extra = adt_swot_extra - adtmean_swot_extra
adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
if adt_duacs is None:
continue
adtmean_duacs = np.nanmean(adt_duacs)
adtc_duacs = adt_duacs - adtmean_duacs
# Trim satellite arrays to MVP transect length
L_full = len(df_mvp)
adt_swot_extra = adt_swot_extra[:L_full]
adtc_swot_extra = adtc_swot_extra[:L_full]
adt_duacs = adt_duacs[:L_full]
adtc_duacs = adtc_duacs[:L_full]
This diagnostic plot provides a direct comparison between raw dynamic topography and centered anomalies from three different data sources (MVP, SWOT, and DUACS) along the MVP transect. This visualization is crucial for understanding how different reference levels affect the comparison between satellite and in-situ measurements.
The figure contains six line plots arranged in three pairs:
MVP measurements:
dhc)dh)These represent the in-situ reference measurements calculated from MVP temperature and salinity profiles.
SWOT measurements:
adtc_swot_extra)adt_swot_extra)These show the sea surface height measured by the SWOT satellite.
DUACS measurements:
adtc_duacs)adt_duacs)These represent the merged multi-mission altimetry product, providing a well-established reference.
A horizontal gray line at y=0 provides a reference for the centered anomalies.
This visualization reveals several important aspects:
Vertical offsets: The raw topography lines (dashed) show substantial vertical shifts between datasets, highlighting the different reference levels used by each source. These offsets are primarily due to:
Pattern agreement: Despite the vertical offsets, the centered anomalies (solid lines) should show similar spatial patterns if all sources are capturing the same oceanographic features.
Amplitude differences: Variations in the amplitude of oscillations between datasets may indicate differences in spatial resolution, smoothing, or the sensitivity to fine-scale features.
By examining both raw and centered versions together, we can distinguish between issues related to reference levels (affecting the absolute values) and those related to the actual measurement of oceanographic features (affecting the spatial patterns). This distinction is crucial for proper validation of SWOT measurements against in-situ observations.
### 6.3.1 Centred anomalies and raw topography
plt.figure()
plt.plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], df_mvp['dh'],
'--', label='MVP (raw topography)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adt_swot_extra,
'--', label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
plt.plot(df_mvp['dist'], adt_duacs,
'--', label='DUACS (raw topography)', color=swot_color_total, zorder=2)
plt.plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
plt.xlabel('Distance along transect (km)')
plt.ylabel('Dynamic Topography Anomaly (m)')
plt.legend(loc='lower right', fontsize=7)
plt.title(f'PL{PL} - Extra Plot 1: Centered anomalies vs raw topography')
plt.show()
This figure isolates the raw (non-centered) dynamic topography measurements from the three different sources, allowing for direct comparison of absolute values along the transect. By focusing only on the raw measurements, we can better assess the absolute differences between satellite and in-situ observations.
The plot contains three lines:
MVP raw dynamic height (red line):
SWOT raw ADT (blue line):
DUACS raw ADT (green line):
The vertical offsets between these measurements are particularly informative:
MVP vs. Satellite offsets: Differences between MVP and satellite measurements often result from different reference levels. The MVP dynamic height is calculated relative to a reference pressure level (typically the deepest measurement), while satellite ADT is referenced to the geoid.
SWOT vs. DUACS offsets: Differences between the two satellite products might indicate variations in the MDT models used, calibration approaches, or processing methodologies.
The shapes of the curves, independent of their vertical positions, reveal how well each dataset captures the spatial variations in dynamic topography. Similar patterns with consistent vertical offsets suggest good agreement in detecting oceanographic features, while divergent patterns would indicate more fundamental measurement differences.
This analysis of raw topography is essential for understanding the absolute calibration of different measurement systems and identifying systematic biases that might need correction in the satellite products.
### 6.3.2 Raw topography comparison
plt.figure()
plt.plot(df_mvp['dist'], df_mvp['dh'],
label='MVP (raw topography)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], adt_swot_extra,
label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adt_duacs,
label='DUACS (raw topography)', color=swot_color_total, zorder=2)
plt.xlabel('Distance along transect (km)')
plt.ylabel('Dynamic Topography Anomaly (m)')
plt.legend(loc='lower right', fontsize=7)
plt.title(f'PL{PL} - Extra Plot 2: Raw dynamic topography comparison')
plt.show()
This visualization focuses exclusively on the centered anomalies of dynamic topography from MVP, SWOT, and DUACS, allowing for a direct comparison of spatial patterns without the influence of different reference levels. By centering each dataset (subtracting its mean value), we can more clearly assess how well the satellite measurements capture the relative variations in sea surface height.
The plot contains three lines representing centered anomalies:
MVP centered dynamic height (dhc, red line):
SWOT centered ADT (adtc_swot_extra, blue line):
DUACS centered ADT (adtc_duacs, green line):
With the centering operation, all three curves should theoretically align if they perfectly capture the same oceanographic features. The degree of alignment therefore indicates the quality of the satellite measurements relative to in-situ observations.
Key aspects to analyze in this figure include:
This centered comparison is particularly valuable for evaluating SWOT's ability to resolve mesoscale and submesoscale oceanographic features, which is one of the primary scientific objectives of the mission. It helps quantify how well SWOT captures the spatial gradients that drive geostrophic currents, independent of absolute calibration issues.
### 6.3.3 Centered anomalies comparison
plt.figure()
plt.plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
plt.xlabel('Distance along transect (km)')
plt.ylabel('Dynamic Topography Anomaly (m)')
plt.legend(loc='lower right', fontsize=7)
plt.title(f'PL{PL} - Extra Plot 3: Centered anomalies comparison')
plt.show()
This multi-panel figure provides a comprehensive comparison of dynamic topography measurements by combining the centered anomalies in the upper panel with their pairwise differences in the lower panel. This approach enables both qualitative observation of patterns and quantitative assessment of discrepancies.
The figure contains two vertically stacked panels sharing the same x-axis (distance along transect):
Similar to the previous plot, this panel displays the centered dynamic topography anomalies from three sources:
A horizontal gray line at y=0 provides a reference baseline.
This panel displays three difference curves, quantifying the discrepancies between datasets:
SWOT - MVP (blue line):
DUACS - MVP (green line):
SWOT - DUACS (orange line):
Both panels have consistent y-axis limits to facilitate comparison, with anomalies ranging from -0.08 to 0.08 meters and differences ranging from -0.06 to 0.06 meters.
This visualization enables several key analyses:
This combined visualization provides an optimal framework for comprehensive validation of SWOT measurements against both in-situ data and established satellite products.
### 6.3.4 Subplots – centred anomalies & differences (I)
fig, axs = plt.subplots(2, 1, sharex=True)
axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
axs[0].plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
axs[0].plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
axs[0].legend(loc='lower right', fontsize=7)
axs[0].set_ylim(-0.08, 0.08)
axs[1].plot(df_mvp['dist'], adtc_swot_extra - df_mvp['dhc'],
label='SWOT - MVP', color=swot_color_adt, zorder=3)
axs[1].plot(df_mvp['dist'], adtc_duacs - df_mvp['dhc'],
label='DUACS - MVP', color=swot_color_total, zorder=3)
axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
label='SWOT - DUACS', color='orange', zorder=3)
axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[1].set_xlabel('Distance (km)')
axs[1].set_ylabel('Measurement difference (m)')
axs[1].legend(loc='lower right', fontsize=7)
axs[1].set_ylim(-0.06, 0.06)
plt.tight_layout()
plt.suptitle(
f'PL{PL} - Extra Plot 4: Subplots – Centered anomalies & differences (I)',
y=1.02
)
plt.show()
This figure provides an alternative perspective on the differences between measurements by inverting two of the difference calculations in the lower panel. While the upper panel remains identical to the previous figure, the lower panel now shows differences from MVP's perspective rather than from the satellite products' perspective.
As in the previous figure, this panel displays the centered dynamic topography anomalies from three sources:
The key distinction in this figure is that two of the difference calculations have been inverted:
MVP - SWOT (blue line):
df_mvp['dhc'] - adtc_swot_extra (reversed from previous figure)MVP - DUACS (green line):
df_mvp['dhc'] - adtc_duacs (reversed from previous figure)SWOT - DUACS (orange line):
This reversed perspective is valuable because:
Reference framing: It positions the MVP as the reference measurement, which aligns with validation methodology where in-situ data is typically considered the ground truth.
Error interpretation: It allows for more intuitive interpretation of where satellite products may be underestimating or overestimating dynamic topography relative to direct oceanographic measurements.
Complementary view: When analyzed alongside the previous figure, it provides a complete picture of the relationships between the three datasets.
The consistent y-axis limits (-0.06 to 0.06 meters for differences) allow for direct comparison with the previous figure, making it easy to see that the blue and green lines are simply inverted while the orange line remains the same.
This alternative presentation of differences highlights the importance of considering the reference frame when validating satellite measurements and provides additional insights into the nature and magnitude of discrepancies between different measurement systems.
### 6.3.5 Subplots – centred anomalies & differences (II)
fig, axs = plt.subplots(2, 1, sharex=True)
axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
axs[0].plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
axs[0].plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
axs[0].legend(loc='lower right', fontsize=7)
axs[0].set_ylim(-0.08, 0.08)
axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_swot_extra,
label='MVP - SWOT', color=swot_color_adt, zorder=3)
axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_duacs,
label='MVP - DUACS', color=swot_color_total, zorder=3)
axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
label='SWOT - DUACS', color='orange', zorder=3)
axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[1].set_xlabel('Distance (km)')
axs[1].set_ylabel('Measurement difference (m)')
axs[1].legend(loc='lower right', fontsize=7)
axs[1].set_ylim(-0.06, 0.06)
plt.tight_layout()
plt.suptitle(
f'PL{PL} - Extra Plot 5: Subplots – Centered anomalies & differences (II)',
y=1.02
)
plt.show()
# 1. IMPORT LIBRARIES
## 1.1 Core scientific computing
import gsw # Gibbs Seawater toolbox
import numpy as np
import scipy.io as sio
## 1.2 Geospatial processing
import geopy.distance
from mpl_toolkits.basemap import Basemap
## 1.3 Visualisation tools
import matplotlib.pyplot as plt
## 1.4 Data handling and I/O
import pandas as pd
import glob
import os
from datetime import datetime, timedelta
import xarray as xr
import netCDF4 as nc
## 1.5 Interpolation utilities
from scipy.interpolate import griddata
from scipy.spatial import cKDTree, ConvexHull
## 1.6 Pattern matching and progress monitoring
import re
from tqdm import tqdm
## 1.7 Statistical diagnostics
import statsmodels.api as sm # diagnostic plots
# 1.8 Colour palette for plots
swot_color_adt = "blue" # SWOT velocities derived from ADT
swot_color_total = "green" # SWOT total velocities
swot_color_proj = "#006400" # SWOT projected velocities
mvp_color = "red" # MVP velocities
# 2. UTILITY FUNCTION DEFINITIONS
## 2.1 File loading and preprocessing
### 2.1.1 Generic loader for SSH and velocity fields
def loadsshuv(fname, rep, varn, **kwargs):
"""
Load lon/lat, SSH, U, V and ADT fields from a NetCDF file,
with optional geographic sub-setting.
"""
if 'type' in kwargs:
if kwargs['type'] == 'swot':
filei2 = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei2)
if 'area' in kwargs:
area = kwargs['area']
selection = ((ds.longitude > area[0]) &
(ds.longitude < area[2]) &
(ds.latitude > area[1]) &
(ds.latitude < area[3]))
ds = ds.where(selection)
lons = ds.longitude.to_masked_array()
lats = ds.latitude.to_masked_array()
us = ds.ugos.to_masked_array()
vs = ds.vgos.to_masked_array()
ssha = ds.ssha_filtered.to_masked_array()
mss = ds.mss.to_masked_array()
mdt = ds.mdt.to_masked_array()
# Remove rows that contain only NaNs
mask_row = np.all(np.isnan(us), axis=1)
u = us[~mask_row, :]
v = vs[~mask_row, :]
lon = lons[~mask_row, :]
lat = lats[~mask_row, :]
ssha = ssha[~mask_row, :]
mss = mss[~mask_row, :]
mdt = mdt[~mask_row, :]
# Absolute Dynamic Topography (ADT)
adt = ssha + mdt
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
return {
'lon' : lon,
'lat' : lat,
'ssh' : ssha,
'u' : u,
'v' : v,
'ssha': ssha,
'adt' : adt
}
### 2.1.2 Julian-day to calendar conversion
def jj2date(data, origin):
"""
Convert fractional Julian days into calendar dates (YYYY-MM-DD hh:mm:ss).
"""
data = np.asarray(data)
origin = pd.to_datetime(origin)
times = []
for d in data:
day = (origin + pd.to_timedelta(d, unit='D')).strftime('%Y-%m-%d')
h = int((d % 1) * 24)
m = int(((d % 1) * 24 % 1) * 60)
s = int((((d % 1) * 24 % 1) * 60 % 1) * 60)
times.append(f"{day} {h:02d}:{m:02d}:{s:02d}")
return np.array(times)
### 2.1.3 Recursive flattening of nested structures
def extract_values(nested_array):
"""
Flatten arbitrarily nested arrays, tuples or lists into a single list.
"""
values = []
def recurse(arr):
for element in arr:
if isinstance(element, np.ndarray):
if element.dtype == 'O':
recurse(element)
else:
values.append(element.item())
elif isinstance(element, tuple):
for item in element:
recurse(item)
else:
values.append(element)
recurse(nested_array)
return values
## 2.2 Vector manipulation utilities
### 2.2.1 Planar rotation
def rotate_vector(u, v, angle_deg):
"""
Rotate the vector (u, v) by angle_deg degrees.
"""
angle_rad = np.deg2rad(angle_deg)
u_rot = u * np.cos(angle_rad) - v * np.sin(angle_rad)
v_rot = u * np.sin(angle_rad) + v * np.cos(angle_rad)
return u_rot, v_rot
### 2.2.2 Moving average (centred window)
def moving_average(x, interval):
"""
Compute a moving average of x over a centred window of length = interval.
"""
start = (interval - 1) // 2
liste_moyennes = []
for i in range(start, len(x) - start):
liste_moyennes.append(np.mean(x[i - start:i + start + 1]))
return liste_moyennes
### 2.2.3 Projection of one vector onto another
def project_vector(uA, vA, uB, vB):
"""
Project vector (uA, vA) onto the direction of (uB, vB).
"""
normB = np.sqrt(uB**2 + vB**2)
if normB == 0:
return 0., 0.
uB_hat, vB_hat = uB / normB, vB / normB
scalar_proj = uA*uB_hat + vA*vB_hat
return scalar_proj * uB_hat, scalar_proj * vB_hat
### 2.2.4 Date extraction from SWOT filenames
def extract_key_swot(filename):
match = re.search(r'\d{8}T\d{6}', filename)
if match:
return datetime.strptime(match.group(0)[:8], '%Y%m%d').date()
return None
# 3. SATELLITE AND AUXILIARY DATA HANDLING
## 3.1 SWOT and DUACS co-location
### 3.1.1 SWOT interpolation onto MVP positions
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
"""
Load SWOT data and interpolate ADT onto MVP sampling points.
"""
domain = [np.floor(np.min(mvp['lon'])) - 0.5,
np.floor(np.min(mvp['lat'])) - 0.5,
np.ceil(np.max(mvp['lon'])) + 0.5,
np.ceil(np.max(mvp['lat'])) + 0.5]
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
dayv_dt.strftime('%Y%m%d') + '*_*.nc')
if not fname:
print(f"No SWOT file found in load_SWOT for date {dayv}")
return None
swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
points = np.array((swot['lon'].flatten(), swot['lat'].flatten())).T
valuesadt = swot['adt'].flatten()
adtc = griddata(points, valuesadt,
(mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtc)
### 3.1.2 DUACS interpolation onto MVP positions
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
"""
Load DUACS data and interpolate ADT onto MVP sampling points.
"""
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
dayv_dt.strftime('%Y%m%d') + '_*.nc')
if not fname:
print(f"No DUACS file found in load_DUACS for date {dayv}")
return None
duacs = loadsshuv(fname[0], '', varn)
X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
points = np.array((X.flatten(), Y.flatten())).T
valuesadt = duacs['adt'].values.flatten()
adtd = griddata(points, valuesadt,
(mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtd)
## 3.2 SWOT file catalogue and acquisition dates
swot_files = glob.glob('/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
'SWOT_L3_LR_SSH_Expert_*_v2.0.0.nc')
swot_files.sort(key=extract_key_swot)
dates_swot = []
for files in swot_files:
filename = os.path.basename(files)
date_strs = os.path.splitext(filename)[0].split('_')[7:8]
for date_str in date_strs:
dates_swot.append(datetime.strptime(date_str[:8], '%Y%m%d').date())
## 3.3 Frontal region convex-hull computation
df_f = pd.read_csv("/Volumes/Extreme SSD/MIO/tsg_crown_AFB.csv")
region_f_df = df_f[df_f['Region'] == 'F']
coords_f = region_f_df[['Longitude', 'Latitude']].dropna().values
if len(coords_f) > 2:
hull_f = ConvexHull(coords_f)
polygon_points_f = coords_f[hull_f.vertices]
else:
polygon_points_f = coords_f
### 3.3.1 Frontal hull plotting
def plot_fronte(m):
"""
Draw the convex hull of the front in violet.
"""
if len(polygon_points_f) < 2:
return
for i in range(len(polygon_points_f)):
j = (i + 1) % len(polygon_points_f)
lon1, lat1 = polygon_points_f[i]
lon2, lat2 = polygon_points_f[j]
x1, y1 = m(lon1, lat1)
x2, y2 = m(lon2, lat2)
style = '--' if abs(lon2 - lon1) > abs(lat2 - lat1) else ':'
if i == 0:
m.plot([x1, x2], [y1, y2], color='purple',
linestyle=style, linewidth=2,
zorder=15, label='Front')
else:
m.plot([x1, x2], [y1, y2], color='purple',
linestyle=style, linewidth=2, zorder=15)
# 4. TRANSECT-BASED VELOCITY DIAGNOSTICS
## 4.1 Signed component perpendicular to transect
def compute_signed_speed_along_transect(u, v, lon, lat):
"""
Compute the signed scalar component along the direction perpendicular
to the transect defined by (lon, lat).
"""
N_val = len(u)
w_signed = np.zeros(N_val)
for i in range(N_val):
if N_val == 1:
w_signed[i] = 0.0
continue
if i == 0:
dx = lon[i+1] - lon[i]
dy = lat[i+1] - lat[i]
elif i == N_val-1:
dx = lon[i] - lon[i-1]
dy = lat[i] - lat[i-1]
else:
dx = lon[i+1] - lon[i-1]
dy = lat[i+1] - lat[i-1]
nx, ny = dy, -dx
norm_n = np.sqrt(nx**2 + ny**2)
if norm_n < 1e-10:
w_signed[i] = 0.0
continue
nx, ny = nx / norm_n, ny / norm_n
w_signed[i] = (u[i]*nx + v[i]*ny)
return w_signed
## 4.2 Signed full-speed magnitude
def compute_signed_speed_tot(u, v, lon, lat):
"""
Return signed total speed: |(u, v)| with sign determined by the
projection onto (nx, ny).
"""
N_val = len(u)
w_signed_tot = np.zeros(N_val)
for i in range(N_val):
if N_val == 1:
w_signed_tot[i] = 0.0
continue
if i == 0:
dx = lon[i+1] - lon[i]
dy = lat[i+1] - lat[i]
elif i == N_val-1:
dx = lon[i] - lon[i-1]
dy = lat[i] - lat[i-1]
else:
dx = lon[i+1] - lon[i-1]
dy = lat[i+1] - lat[i-1]
nx, ny = dy, -dx
norm_n = np.sqrt(nx**2 + ny**2)
speed = np.sqrt(u[i]**2 + v[i]**2)
if norm_n < 1e-10:
w_signed_tot[i] = 0.0
continue
nx, ny = nx / norm_n, ny / norm_n
sign_ = 1.0 if (u[i]*nx + v[i]*ny) >= 0 else -1.0
w_signed_tot[i] = sign_ * speed
return w_signed_tot
# 5. MVP TRANSECT PROCESSING LOOP
## 5.0 Preliminary set-up
PLs = np.array([5, 6, 7, 11, 12, 15, 17])
for PL in tqdm(PLs):
# 5.1 LOAD MVP DATA AND COMPUTE DYNAMIC HEIGHT
## 5.1.1 Read .mat metadata and convert dates
mat_file = sio.loadmat(
'/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
f'MVP/L2/PL{PL}.mat'
)
date_extract = extract_values(mat_file['metadata']['date'])
date_pd = pd.DataFrame(date_extract)
datejj = date_pd[0]
dates_mvp = [
datetime.fromordinal(int(d[0][0])) +
timedelta(days=d[0][0] % 1) -
timedelta(days=366)
for d in datejj
]
## 5.1.2 Oceanographic variables and centred dynamic height anomaly
mat = sio.loadmat(
'/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/'
f'MVP/L2_python/PL{PL}_python.mat'
)
lon_mvp, lat_mvp = mat['lon'], mat['lat']
p_mvp, t_mvp = mat['pres'], mat['temp']
SP_mvp = mat['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81 # gravitational acceleration (m s⁻²)
dh = []
for i in range(SA.shape[0]):
dh_i = (
gsw.geo_strf_dyn_height(
SA[i][~np.isnan(SA[i])],
CT[i][~np.isnan(SA[i])],
p_mvp[i][~np.isnan(SA[i])],
p_ref=np.max(p_mvp[i][~np.isnan(SA[i])])
)[0] / g_
)
dh.append(dh_i)
dhmean = np.mean(dh)
dhc = dh - dhmean # centred anomaly
df_mvp_raw = pd.DataFrame({
'dh' : dh,
'dhc': dhc,
'lon': lon_mvp[0],
'lat': lat_mvp[0]
})
# 5.2 DISTANCE METRIC AND REGULAR GRIDDING
## 5.2.1 Cumulative distance along transect
dist_along_mvp = np.zeros(len(df_mvp_raw))
for i in range(1, len(df_mvp_raw)):
dist_along_mvp[i] = (
dist_along_mvp[i-1] +
geopy.distance.distance(
(df_mvp_raw['lat'].values[i-1],
df_mvp_raw['lon'].values[i-1]),
(df_mvp_raw['lat'].values[i],
df_mvp_raw['lon'].values[i])
).km
)
df_mvp_raw['dist'] = dist_along_mvp
## 5.2.2 Interpolation to 1-km grid and 11-km smoothing
step_km = 1
window_size_km = 11
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
window_size = int(window_size_km / step_km)
if window_size % 2 == 0:
window_size += 1 # enforce odd window length
dh_reg_smooth = (
pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.values
)
dhc_reg_smooth = (
pd.Series(dhc_reg)
.rolling(window=window_size, center=True)
.mean()
.values
)
df_mvp = pd.DataFrame({
'dist' : dist_reg,
'lon' : lon_reg,
'lat' : lat_reg,
'dh_raw' : dh_reg,
'dhc_raw' : dhc_reg,
'dh' : dh_reg_smooth,
'dhc' : dhc_reg_smooth
}).dropna(subset=['dh', 'dhc']).copy()
# 5.3 GEOSTROPHIC VELOCITY FROM MVP
w_mvp_smooth = gsw.geostrophy.geostrophic_velocity(
df_mvp['dh'].values,
df_mvp['lon'].values,
df_mvp['lat'].values
)
N_mvp = len(w_mvp_smooth[0])
df_mvp = df_mvp.iloc[:N_mvp].copy()
u_mvp = np.zeros(N_mvp)
v_mvp = np.zeros(N_mvp)
for i in range(N_mvp):
if N_mvp > 1:
if i == 0:
dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i]
dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i]
elif i == N_mvp-1:
dx = w_mvp_smooth[1][i] - w_mvp_smooth[1][i-1]
dy = w_mvp_smooth[2][i] - w_mvp_smooth[2][i-1]
else:
dx = w_mvp_smooth[1][i+1] - w_mvp_smooth[1][i-1]
dy = w_mvp_smooth[2][i+1] - w_mvp_smooth[2][i-1]
angle = np.arctan2(dx, dy)
u_mvp[i] = - w_mvp_smooth[0][i] * g_ * np.cos(angle)
v_mvp[i] = w_mvp_smooth[0][i] * g_ * np.sin(angle)
df_mvp['u_mvp'] = u_mvp
df_mvp['v_mvp'] = v_mvp
# 5.4 MATCH MVP DATES TO SWOT
## 5.4.1 Identify corresponding SWOT file
dayv = None
for dsw in dates_swot:
if dsw == dates_mvp[0].date():
dayv = dsw.strftime('%Y-%m-%d')
break
if not dayv:
print(f"No matching (SWOT) date for PL{PL}")
continue
domain = [3, 38.5, 6.5, 44]
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn_swot = {
'longitude': 'longitude',
'latitude' : 'latitude',
'u' : 'ugos',
'v' : 'vgos',
'adt' : 'adt'
}
fname = glob.glob(
'/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
f'SWOT_L3_LR_SSH_Expert_*{dayv_dt.strftime("%Y%m%d")}*_*.nc'
)
if not fname:
print(f"No SWOT file found for date: {dayv} - PL{PL}")
continue
swot = loadsshuv(fname[0], '', varn_swot, type='swot', area=domain)
adt_flat_swot = swot['adt'].flatten()
lon_flat_swot = swot['lon'].flatten()
lat_flat_swot = swot['lat'].flatten()
u_swot_tot = swot['u'].flatten()
v_swot_tot = swot['v'].flatten()
adtmean_swot = np.nanmean(adt_flat_swot)
adtc_swot = adt_flat_swot - adtmean_swot
# 5.4.2 Nearest-neighbour search to MVP transect
valid_mask = ~np.isnan(lon_flat_swot) & ~np.isnan(lat_flat_swot)
lon_valid = lon_flat_swot[valid_mask]
lat_valid = lat_flat_swot[valid_mask]
u_valid = u_swot_tot[valid_mask]
v_valid = v_swot_tot[valid_mask]
adt_valid = adt_flat_swot[valid_mask]
tree = cKDTree(np.c_[lon_valid, lat_valid])
_, indices = tree.query(
np.c_[df_mvp['lon'].values, df_mvp['lat'].values], k=1
)
u_swot = u_valid[indices]
v_swot = v_valid[indices]
adt_on_transect = adt_valid[indices]
# 5.5 DIAGNOSTIC SCALING AND SWOT-FROM-ADT COMPUTATION
coeff_scaled = 100000
u_mvp_scaled = df_mvp['u_mvp'].values * coeff_scaled
v_mvp_scaled = df_mvp['v_mvp'].values * coeff_scaled
## 5.5.1 Geostrophic velocity from interpolated SWOT ADT
adt_swot_interp = griddata(
(lon_valid, lat_valid),
adt_valid,
(df_mvp['lon'].values, df_mvp['lat'].values),
method='linear'
)
w_swot_calc_raw = gsw.geostrophy.geostrophic_velocity(
adt_swot_interp,
df_mvp['lon'].values,
df_mvp['lat'].values
)
N2 = min(len(w_swot_calc_raw[0]),
len(w_swot_calc_raw[1]),
len(w_swot_calc_raw[2]))
u_swot_calc = np.zeros(N2)
v_swot_calc = np.zeros(N2)
for i in range(N2):
if N2 == 1:
dx2 = dy2 = 0
elif i == 0:
dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i]
dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i]
elif i == N2-1:
dx2 = w_swot_calc_raw[1][i] - w_swot_calc_raw[1][i-1]
dy2 = w_swot_calc_raw[2][i] - w_swot_calc_raw[2][i-1]
else:
dx2 = w_swot_calc_raw[1][i+1] - w_swot_calc_raw[1][i-1]
dy2 = w_swot_calc_raw[2][i+1] - w_swot_calc_raw[2][i-1]
angle2 = np.arctan2(dx2, dy2)
u_swot_calc[i] = - w_swot_calc_raw[0][i] * g_ * np.cos(angle2)
v_swot_calc[i] = w_swot_calc_raw[0][i] * g_ * np.sin(angle2)
scale_calc = 100000
u_swot_calc_scaled = u_swot_calc * scale_calc
v_swot_calc_scaled = v_swot_calc * scale_calc
# 6. VISUALISATION AND COMPARATIVE ANALYSIS
## 6.1 Map-based diagnostics
### 6.1.1 ADT-based velocity vs MVP
min_len_calc = min(N2, len(df_mvp))
plt.figure(figsize=(12, 8))
m = Basemap(
projection='merc',
llcrnrlat=df_mvp['lat'].min() - 0.4,
urcrnrlat=df_mvp['lat'].max() + 0.4,
llcrnrlon=df_mvp['lon'].min() - 0.4,
urcrnrlon=df_mvp['lon'].max() + 0.4,
resolution='i'
)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.fillcontinents(color='beige', lake_color='lightblue')
m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])
x_swot_map, y_swot_map = m(lon_valid, lat_valid)
sc = m.scatter(
x_swot_map, y_swot_map,
c=adt_valid - np.nanmean(adt_valid),
cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
)
x_pl_map_1, y_pl_map_1 = m(
df_mvp['lon'].values[:min_len_calc],
df_mvp['lat'].values[:min_len_calc]
)
m.scatter(x_pl_map_1, y_pl_map_1, color='black', label=f'PL{PL}', zorder=5, s=10)
q1 = m.quiver(
x_pl_map_1, y_pl_map_1,
u_swot_calc_scaled[:min_len_calc],
v_swot_calc_scaled[:min_len_calc],
angles='xy', scale_units='xy', scale=1,
color=swot_color_adt, zorder=5,
label='SWOT velocity (from ADT)'
)
m.quiver(
x_pl_map_1, y_pl_map_1,
u_mvp_scaled[:min_len_calc],
v_mvp_scaled[:min_len_calc],
angles='xy', scale_units='xy', scale=1,
color=mvp_color, zorder=10, label='MVP velocity'
)
plt.quiverkey(q1, 0.80, 0.1, 0.10 * scale_calc, '0.10 m/s',
labelpos='E', coordinates='axes', color='black')
plt.colorbar(sc, label='SWOT adt (m) [unsmoothed]')
plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (1) ADT-based Velocity vs MVP')
plot_fronte(m)
plt.legend(loc='upper right')
plt.show()
### 6.1.2 Full SWOT velocity vs MVP
min_len_uv = min(len(u_swot), len(df_mvp))
plt.figure(figsize=(12, 8))
m = Basemap(
projection='merc',
llcrnrlat=df_mvp['lat'].min() - 0.4,
urcrnrlat=df_mvp['lat'].max() + 0.4,
llcrnrlon=df_mvp['lon'].min() - 0.4,
urcrnrlon=df_mvp['lon'].max() + 0.4,
resolution='i'
)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.fillcontinents(color='beige', lake_color='lightblue')
m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])
x_pl, y_pl = m(
df_mvp['lon'].values[:min_len_uv],
df_mvp['lat'].values[:min_len_uv]
)
x_swot, y_swot = m(lon_valid, lat_valid)
sc = m.scatter(
x_swot, y_swot,
c=adt_valid - np.nanmean(adt_valid),
cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
)
m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)
u_swot_scaled_tot = u_swot[:min_len_uv] * coeff_scaled
v_swot_scaled_tot = v_swot[:min_len_uv] * coeff_scaled
q2 = m.quiver(
x_pl, y_pl,
u_swot_scaled_tot, v_swot_scaled_tot,
angles='xy', scale_units='xy', scale=1,
color=swot_color_total, zorder=5,
label='SWOT velocity (TOTAL)'
)
m.quiver(
x_pl, y_pl,
u_mvp_scaled[:min_len_uv], v_mvp_scaled[:min_len_uv],
angles='xy', scale_units='xy', scale=1,
color=mvp_color, zorder=10, label='MVP velocity'
)
plt.quiverkey(q2, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
labelpos='E', coordinates='axes', color='black')
plt.colorbar(sc, label='SWOT adt (m)')
plt.title(f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (2) FULL Velocity')
plot_fronte(m)
plt.legend(loc='upper right')
plt.show()
### 6.1.3 Projected SWOT velocity vs MVP
w_swot_proj_sign_map = compute_signed_speed_along_transect(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
u_swot_proj_map = np.zeros(len(df_mvp))
v_swot_proj_map = np.zeros(len(df_mvp))
for i in range(len(df_mvp)):
if len(df_mvp) == 1:
dx = dy = 0
elif i == 0:
dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i]
dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i]
elif i == len(df_mvp) - 1:
dx = df_mvp['lon'].values[i] - df_mvp['lon'].values[i-1]
dy = df_mvp['lat'].values[i] - df_mvp['lat'].values[i-1]
else:
dx = df_mvp['lon'].values[i+1] - df_mvp['lon'].values[i-1]
dy = df_mvp['lat'].values[i+1] - df_mvp['lat'].values[i-1]
nx, ny = dy, -dx
norm_n = np.sqrt(nx**2 + ny**2)
if norm_n < 1e-10:
u_swot_proj_map[i] = 0.
v_swot_proj_map[i] = 0.
else:
nx, ny = nx / norm_n, ny / norm_n
scalar_proj = w_swot_proj_sign_map[i]
u_swot_proj_map[i] = scalar_proj * nx
v_swot_proj_map[i] = scalar_proj * ny
plt.figure(figsize=(12, 8))
m = Basemap(
projection='merc',
llcrnrlat=df_mvp['lat'].min() - 0.4,
urcrnrlat=df_mvp['lat'].max() + 0.4,
llcrnrlon=df_mvp['lon'].min() - 0.4,
urcrnrlon=df_mvp['lon'].max() + 0.4,
resolution='i'
)
m.drawcoastlines()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.fillcontinents(color='beige', lake_color='lightblue')
m.drawparallels(np.arange(-90., 91., 1.), labels=[True, False, False, False])
m.drawmeridians(np.arange(-180., 181., 1.), labels=[False, False, False, True])
x_swot, y_swot = m(lon_valid, lat_valid)
sc = m.scatter(
x_swot, y_swot,
c=adt_valid - np.nanmean(adt_valid),
cmap='Spectral_r', vmin=-0.1, vmax=0.1, zorder=1
)
x_pl, y_pl = m(df_mvp['lon'].values, df_mvp['lat'].values)
m.scatter(x_pl, y_pl, color='black', label=f'PL{PL}', zorder=5, s=10)
u_swot_proj_scaled_map = u_swot_proj_map * coeff_scaled
v_swot_proj_scaled_map = v_swot_proj_map * coeff_scaled
q3 = m.quiver(
x_pl, y_pl,
u_swot_proj_scaled_map, v_swot_proj_scaled_map,
angles='xy', scale_units='xy', scale=1,
color=swot_color_proj, zorder=5,
label='SWOT velocity (Projected)'
)
m.quiver(
x_pl, y_pl,
u_mvp_scaled, v_mvp_scaled,
angles='xy', scale_units='xy', scale=1,
color=mvp_color, zorder=10, label='MVP velocity'
)
plt.quiverkey(q3, 0.80, 0.1, 0.10 * coeff_scaled, '0.10 m/s',
labelpos='E', coordinates='axes', color='black')
plt.colorbar(sc, label='SWOT adt (m)')
plt.title(
f'PL{PL} - {dayv_dt.strftime("%Y-%m-%d")} - (3) PROJECTED Velocity (New Method)'
)
plot_fronte(m)
plt.legend(loc='upper right')
plt.show()
## 6.2 One-dimensional signed velocity comparison
### 6.2.1 Signed velocity profiles
w_swot_tot_sign = - compute_signed_speed_tot(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
w_swot_proj_sign = - compute_signed_speed_along_transect(
u_swot, v_swot, df_mvp['lon'].values, df_mvp['lat'].values
)
w_mvp_sign = - compute_signed_speed_along_transect(
df_mvp['u_mvp'].values, df_mvp['v_mvp'].values,
df_mvp['lon'].values, df_mvp['lat'].values
)
w_swot_calc_sign = - compute_signed_speed_along_transect(
u_swot_calc, v_swot_calc,
df_mvp['lon'].values[:N2], df_mvp['lat'].values[:N2]
)
duacsdir = '/Volumes/Extreme SSD/MIO/DUACS/'
savefig = ''
# DUACS reference velocity
adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
if adt_duacs is not None:
w_duacs_raw = gsw.geostrophy.geostrophic_velocity(
adt_duacs, df_mvp['lon'].values, df_mvp['lat'].values
)
N_duacs = len(w_duacs_raw[0])
u_duacs = np.zeros(N_duacs)
v_duacs = np.zeros(N_duacs)
for i in range(N_duacs):
if N_duacs == 1:
dx_d = dy_d = 0
elif i == 0:
dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i]
dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i]
elif i == N_duacs - 1:
dx_d = w_duacs_raw[1][i] - w_duacs_raw[1][i-1]
dy_d = w_duacs_raw[2][i] - w_duacs_raw[2][i-1]
else:
dx_d = w_duacs_raw[1][i+1] - w_duacs_raw[1][i-1]
dy_d = w_duacs_raw[2][i+1] - w_duacs_raw[2][i-1]
angle_d = np.arctan2(dx_d, dy_d)
u_duacs[i] = - w_duacs_raw[0][i] * g_ * np.cos(angle_d)
v_duacs[i] = w_duacs_raw[0][i] * g_ * np.sin(angle_d)
w_duacs_sign_full = - compute_signed_speed_along_transect(
u_duacs, v_duacs,
df_mvp['lon'].values, df_mvp['lat'].values
)
else:
w_duacs_sign_full = np.full(len(df_mvp), np.nan)
min_len_4 = min(
len(w_mvp_sign), len(w_swot_tot_sign), len(w_swot_proj_sign),
len(w_swot_calc_sign), len(w_duacs_sign_full)
)
x_4 = df_mvp['dist'].values[:min_len_4]
w_mvp_4 = w_mvp_sign[:min_len_4]
w_swot_tot_4 = w_swot_tot_sign[:min_len_4]
w_swot_proj_4 = w_swot_proj_sign[:min_len_4]
w_swot_calc_4 = w_swot_calc_sign[:min_len_4]
w_duacs_4 = w_duacs_sign_full[:min_len_4]
duacs_color = "magenta"
plt.figure()
plt.plot(x_4, w_mvp_4, 'o-', color=mvp_color, label='MVP (signed)')
plt.plot(x_4, w_swot_tot_4, 's-', color=swot_color_total, label='SWOT TOTAL (signed)')
plt.plot(x_4, w_swot_proj_4, '^-', color=swot_color_proj, label='SWOT Projected (signed)')
plt.plot(x_4, w_swot_calc_4, 'd-', color=swot_color_adt, label='SWOT from ADT (signed)')
plt.plot(x_4, w_duacs_4, 'p-', color=duacs_color, label='DUACS (signed)')
plt.grid(True)
plt.legend()
plt.xlabel('Distance along transect (km)')
plt.ylabel('Velocity (m/s) [signed]')
plt.title(
f'PL{PL} - (4) Signed Velocity Comparison '
'(MVP, TOTAL, Projected, ADT-based, DUACS)'
)
plt.show()
## 6.3 Extra diagnostic plots
swotdir = '/Volumes/Extreme SSD/MIO/SWOT_V2.0.0/v2.0.0/'
savefig = '/Volumes/Extreme SSD/MIO/MVP_TOOLS/BIOSWOTMED/figures'
adt_swot_extra = load_SWOT(df_mvp, dayv, swotdir, savefig)
if adt_swot_extra is None:
continue
adtmean_swot_extra = np.nanmean(adt_swot_extra)
adtc_swot_extra = adt_swot_extra - adtmean_swot_extra
adt_duacs = load_DUACS(df_mvp, dayv, duacsdir, savefig)
if adt_duacs is None:
continue
adtmean_duacs = np.nanmean(adt_duacs)
adtc_duacs = adt_duacs - adtmean_duacs
# Trim satellite arrays to MVP transect length
L_full = len(df_mvp)
adt_swot_extra = adt_swot_extra[:L_full]
adtc_swot_extra = adtc_swot_extra[:L_full]
adt_duacs = adt_duacs[:L_full]
adtc_duacs = adtc_duacs[:L_full]
### 6.3.1 Centred anomalies and raw topography
plt.figure()
plt.plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], df_mvp['dh'],
'--', label='MVP (raw topography)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adt_swot_extra,
'--', label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
plt.plot(df_mvp['dist'], adt_duacs,
'--', label='DUACS (raw topography)', color=swot_color_total, zorder=2)
plt.plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
plt.xlabel('Distance along transect (km)')
plt.ylabel('Dynamic Topography Anomaly (m)')
plt.legend(loc='lower right', fontsize=7)
plt.title(f'PL{PL} - Extra Plot 1: Centered anomalies vs raw topography')
plt.show()
### 6.3.2 Raw topography comparison
plt.figure()
plt.plot(df_mvp['dist'], df_mvp['dh'],
label='MVP (raw topography)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], adt_swot_extra,
label='SWOT (raw topography)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adt_duacs,
label='DUACS (raw topography)', color=swot_color_total, zorder=2)
plt.xlabel('Distance along transect (km)')
plt.ylabel('Dynamic Topography Anomaly (m)')
plt.legend(loc='lower right', fontsize=7)
plt.title(f'PL{PL} - Extra Plot 2: Raw dynamic topography comparison')
plt.show()
### 6.3.3 Centered anomalies comparison
plt.figure()
plt.plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
plt.plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
plt.plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
plt.xlabel('Distance along transect (km)')
plt.ylabel('Dynamic Topography Anomaly (m)')
plt.legend(loc='lower right', fontsize=7)
plt.title(f'PL{PL} - Extra Plot 3: Centered anomalies comparison')
plt.show()
### 6.3.4 Subplots – centred anomalies & differences (I)
fig, axs = plt.subplots(2, 1, sharex=True)
axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
axs[0].plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
axs[0].plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
axs[0].legend(loc='lower right', fontsize=7)
axs[0].set_ylim(-0.08, 0.08)
axs[1].plot(df_mvp['dist'], adtc_swot_extra - df_mvp['dhc'],
label='SWOT - MVP', color=swot_color_adt, zorder=3)
axs[1].plot(df_mvp['dist'], adtc_duacs - df_mvp['dhc'],
label='DUACS - MVP', color=swot_color_total, zorder=3)
axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
label='SWOT - DUACS', color='orange', zorder=3)
axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[1].set_xlabel('Distance (km)')
axs[1].set_ylabel('Measurement difference (m)')
axs[1].legend(loc='lower right', fontsize=7)
axs[1].set_ylim(-0.06, 0.06)
plt.tight_layout()
plt.suptitle(
f'PL{PL} - Extra Plot 4: Subplots – Centered anomalies & differences (I)',
y=1.02
)
plt.show()
### 6.3.5 Subplots – centred anomalies & differences (II)
fig, axs = plt.subplots(2, 1, sharex=True)
axs[0].plot(df_mvp['dist'], df_mvp['dhc'],
label='MVP (centered adt)', color=mvp_color, zorder=5)
axs[0].plot(df_mvp['dist'], adtc_swot_extra,
label='SWOT (centered adt)', color=swot_color_adt, zorder=3)
axs[0].plot(df_mvp['dist'], adtc_duacs,
label='DUACS (centered adt)', color=swot_color_total, zorder=2)
axs[0].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[0].set_ylabel('Dynamic Topography Anomaly (m)')
axs[0].legend(loc='lower right', fontsize=7)
axs[0].set_ylim(-0.08, 0.08)
axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_swot_extra,
label='MVP - SWOT', color=swot_color_adt, zorder=3)
axs[1].plot(df_mvp['dist'], df_mvp['dhc'] - adtc_duacs,
label='MVP - DUACS', color=swot_color_total, zorder=3)
axs[1].plot(df_mvp['dist'], adtc_swot_extra - adtc_duacs,
label='SWOT - DUACS', color='orange', zorder=3)
axs[1].plot([df_mvp['dist'].min(), df_mvp['dist'].max()], [0, 0],
'-', color='grey', zorder=1)
axs[1].set_xlabel('Distance (km)')
axs[1].set_ylabel('Measurement difference (m)')
axs[1].legend(loc='lower right', fontsize=7)
axs[1].set_ylim(-0.06, 0.06)
plt.tight_layout()
plt.suptitle(
f'PL{PL} - Extra Plot 5: Subplots – Centered anomalies & differences (II)',
y=1.02
)
plt.show()
0%| | 0/7 [00:00<?, ?it/s]
14%|██████▍ | 1/7 [00:15<01:31, 15.30s/it]
29%|████████████▊ | 2/7 [00:29<01:13, 14.74s/it]
43%|███████████████████▎ | 3/7 [01:00<01:27, 21.98s/it]
57%|█████████████████████████▋ | 4/7 [01:31<01:16, 25.61s/it]
71%|████████████████████████████████▏ | 5/7 [02:02<00:55, 27.52s/it]
86%|██████████████████████████████████████▌ | 6/7 [02:17<00:23, 23.16s/it]
100%|█████████████████████████████████████████████| 7/7 [02:48<00:00, 24.02s/it]
This second section of the notebook implements a systematic methodology for optimizing the reference pressure level used in dynamic height calculations. This optimization is crucial for accurate comparison between in-situ MVP measurements and satellite altimetry data.
Dynamic height calculations from oceanographic profiles require a reference "level of no motion" - a depth at which horizontal flow is assumed negligible. Traditionally, the deepest measurement depth is used, but this approach may not yield optimal results for satellite validation. This code implements a data-driven approach to identify the reference depth that maximizes agreement between MVP and SWOT measurements.
This optimization framework:
The analysis integrates data from all available transects (PL5-PL24) to ensure the optimization is robust and not biased by individual profiles. This aggregated approach provides a comprehensive assessment of how reference level choice affects the agreement between in-situ and satellite measurements.
Understanding the optimal reference level has important implications beyond validation, providing insight into the vertical structure of ocean circulation in the Northwestern Mediterranean and improving our ability to accurately estimate geostrophic velocities from density profiles.
This section imports the necessary Python libraries for our reference level optimization analysis:
This analysis pipeline builds on the previous validation framework but focuses specifically on optimizing the reference pressure level used for dynamic height calculations. By systematically testing different reference depths, we can identify which level produces the best agreement between MVP and SWOT measurements across all available transects.
## 1.1 External and standard modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob, os
from datetime import datetime, timedelta
import gsw
import scipy.io as sio
import geopy.distance
from scipy.spatial import cKDTree
from scipy.stats import pearsonr
import xarray as xr
from scipy.interpolate import griddata
from tqdm import tqdm
When working with oceanographic data formats, particularly MATLAB files loaded into Python via scipy.io, we often encounter deeply nested data structures that need to be flattened and simplified. The two functions in this section address this challenge:
get_scalar: Single-Value Extraction¶This utility function recursively traverses nested data structures to extract a single scalar value. This is particularly useful for extracting individual metadata elements like timestamps or specific measurement values from complex data structures.
The function handles different container types (lists, tuples, NumPy arrays) and applies appropriate extraction methods based on the container type:
.item() methodextract_values: Comprehensive Flattening¶For more complex nested structures, the extract_values function performs a complete flattening, extracting all primitive values into a single list. It employs a recursive approach:
recurse that processes each elementdtype='O'), it recurses into the arrayThese functions are essential when processing the MVP data files, which contain metadata like date information and oceanographic measurements stored in complex nested structures. By efficiently extracting and organizing these values, we can proceed with the analysis without being hindered by the original data format.
## 2.1 Scalar and value extraction
def get_scalar(val):
"""Extract a scalar from a possibly nested iterable."""
while isinstance(val, (list, tuple, np.ndarray)):
if isinstance(val, np.ndarray):
if val.size == 1:
val = val.item()
else:
val = val[0]
else:
val = val[0]
return val
def extract_values(nested_array):
"""Flatten a nested array and return the values as a list."""
values = []
def recurse(arr):
for element in arr:
if isinstance(element, np.ndarray):
if element.dtype == 'O':
recurse(element)
else:
values.append(element.item())
elif isinstance(element, tuple):
for item in element:
recurse(item)
else:
values.append(element)
recurse(nested_array)
return values
The loadsshuv function provides a standardized interface for loading satellite altimetry data from NetCDF files, with special handling for SWOT data products. Similar to the implementation in the previous code section, this function ensures consistent access to oceanographic variables across different data sources.
loadsshuv(fname, rep, varn, **kwargs)
This function handles two primary data loading scenarios based on the source type:
SWOT data - When kwargs['type'] == 'swot':
Other altimetry products (e.g., DUACS):
varn dictionaryThe function returns a consistent dictionary of oceanographic variables regardless of the source, enabling seamless comparison between different altimetry products. This standardization is crucial for the parameter optimization analysis, as it allows us to apply the same processing pipeline to different datasets.
The returned variables include:
lon, lat)ssh, ssha)u, v)adt)This uniform data structure facilitates the consistent application of validation metrics across different reference level candidates in our sensitivity analysis.
##2.2 Data loading utilities
def loadsshuv(fname, rep, varn, **kwargs):
"""
Load sea-surface height and velocity data from a netCDF file.
"""
if 'type' in kwargs and kwargs['type'] == 'swot':
filei2 = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei2)
if 'area' in kwargs:
area = kwargs['area']
selection = ((ds.longitude > area[0]) &
(ds.longitude < area[2]) &
(ds.latitude > area[1]) &
(ds.latitude < area[3]))
ds = ds.where(selection)
lons = ds.longitude.to_masked_array()
lats = ds.latitude.to_masked_array()
us = ds.ugos.to_masked_array()
vs = ds.vgos.to_masked_array()
ssha = ds.ssha_filtered.to_masked_array()
mss = ds.mss.to_masked_array()
mdt = ds.mdt.to_masked_array()
mask_row = np.all(np.isnan(us), axis=1)
u = us[~mask_row, :]
v = vs[~mask_row, :]
lon = lons[~mask_row, :]
lat = lats[~mask_row, :]
ssha = ssha[~mask_row, :]
mss = mss[~mask_row, :]
mdt = mdt[~mask_row, :]
adt = ssha + mdt
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
'ssha': ssha, 'adt': adt}
The calc_matchup_for_PL function is the core component of our sensitivity analysis, performing a complete end-to-end processing pipeline for comparing MVP and SWOT measurements along a single transect (PL). This function accepts an optional candidate parameter that specifies the reference pressure level to use for dynamic height calculations.
calc_matchup_for_PL(PL, candidate=None)
Parameters:
PL: The profile (plongée) number to processcandidate: Optional reference pressure level (depth) for dynamic height calculationThe function follows a comprehensive four-part processing workflow:
MVP Data Processing and Dynamic Height Computation
p_ref) or the specified candidate valueTransect Construction and Cumulative Distance Calculation
Regular-Grid Interpolation and Smoothing
SWOT Co-location and Correlation
The function includes comprehensive error handling, returning None, None in case of failures at any processing stage.
This function's ability to accept different reference pressure levels makes it ideal for our sensitivity analysis. By varying the candidate parameter, we can evaluate how different reference depths affect the agreement between MVP and SWOT measurements across all available transects.
## 2.3 Match-up computation
def calc_matchup_for_PL(PL, candidate=None):
"""
Compute the MVP–SWOT dynamic-height match-up for a single profile.
"""
try:
This section handles the loading and processing of Moving Vessel Profiler (MVP) data to compute dynamic height. The dynamic height is a fundamental oceanographic quantity that represents the vertical integral of density anomalies, providing a measure of pressure gradients that drive geostrophic flow.
The process starts by loading two MATLAB files:
For timestamp processing, the code:
The oceanographic measurements are then processed following the TEOS-10 standard:
The key physical concept here is the dynamic height calculation, which is based on the vertical integration of specific volume anomaly:
$$\Delta D = \int_{p_{\text{ref}}}^{p} \delta(S, T, p') \, dp'$$Where:
The choice of reference pressure ($p_{\text{ref}}$) is crucial and is the focus of our optimization:
candidate is None, we use the maximum pressure for each castFinally, the function computes centered anomalies by subtracting the mean, allowing for comparison with satellite measurements independent of absolute reference levels.
# Part 1 – MVP processing and dynamic-height computation
mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
date_raw = mvp_mat['metadata']['date']
date_extract = extract_values(date_raw)
dates_mvp = []
for d_val in date_extract:
scalar_val = get_scalar(d_val)
dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
dates_mvp.append(dt)
mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
lon_mvp = mvp_mat2['lon']
lat_mvp = mvp_mat2['lat']
p_mvp = mvp_mat2['pres']
t_mvp = mvp_mat2['temp']
SP_mvp = mvp_mat2['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81
dh = []
for i in range(SA.shape[0]):
valid = ~np.isnan(SA[i])
if np.sum(valid) == 0:
dh.append(np.nan)
continue
p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate
if np.isnan(p_ref):
dh.append(np.nan)
continue
dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
p_mvp[i][valid], p_ref=p_ref)[0] / g_
if isinstance(dh_i, np.ndarray):
dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
dh.append(dh_i)
dh = np.array(dh)
if np.all(np.isnan(dh)):
return None, None
dh_mean = np.nanmean(dh)
dhc = dh - dh_mean # centred anomaly
After computing dynamic height values, the code organizes the data into a structured format and calculates the along-track distance, which will serve as the independent variable for subsequent interpolation and analysis.
The process involves:
Data organization: Creating a pandas DataFrame that contains:
dh)dhc)Cumulative distance calculation: Computing the along-track distance by:
The geodesic distance calculation is important because it provides an accurate measure of distance along the ship track, accounting for the fact that the Earth is not flat. This is particularly relevant in oceanographic surveys that may span significant distances.
The resulting distance metric serves multiple purposes:
By organizing the data in this structured format with a well-defined distance metric, we create a foundation for the spatial processing operations that follow.
# Part 2 – Transect construction and cumulative distance
N = len(dhc)
df_mvp_raw = pd.DataFrame({
'dh': dh,
'dhc': dhc,
'lon': lon_mvp[0][:N],
'lat': lat_mvp[0][:N]
})
dist = np.zeros(N)
for i in range(1, N):
dist[i] = dist[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
(df_mvp_raw['lat'].iloc[i], df_mvp_raw['lon'].iloc[i])
).km
df_mvp_raw['dist'] = dist
This code segment performs two critical data preprocessing steps: spatial interpolation and smoothing of oceanographic measurements.
Interpolation Parameters:
step_km)window_size_km)Linear Interpolation:
interp() to map original measurements onto a regular gridSmoothing Technique:
rolling() methodbfill() and ffill() to manage data boundariesMathematical representations:
The resulting DataFrame provides a consistently gridded and smoothed dataset, prepared for further oceanographic analysis.
Key aspects of the preprocessing:
# Part 3 – Regular-grid interpolation and smoothing
step_km = 1 # horizontal sampling interval (km)
window_size_km = 11 # smoothing window (km)
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
window_size = int(window_size_km / step_km)
if window_size < 1:
window_size = 1
if window_size % 2 == 0:
window_size += 1
dh_reg_smooth = (pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
dhc_reg_smooth = (pd.Series(dhc_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
df_mvp = pd.DataFrame({
'dist': dist_reg,
'lon': lon_reg,
'lat': lat_reg,
'dh': dh_reg_smooth,
'dhc': dhc_reg_smooth
})
The final part of the match-up process involves finding the corresponding SWOT satellite data for the MVP transect and spatially co-locating the measurements. This enables direct comparison between in-situ and satellite observations at coincident points.
The co-location process follows these steps:
Temporal matching:
Spatial subsetting:
loadsshuv functionNearest-neighbor matching:
Centering and validation:
The k-d tree approach provides an efficient spatial matching method that scales well with large datasets. This is important for SWOT data, which has much higher spatial resolution than traditional altimetry.
The function's final output consists of two arrays:
mvp_vals: Centered dynamic height anomalies from MVPswot_vals: Centered ADT anomalies from SWOTThese paired values form the basis for our correlation analysis, allowing us to assess how well SWOT measurements agree with in-situ data for different reference pressure levels.
# Part 4 – SWOT co-location
if len(dates_mvp) == 0:
return None, None
day_str = dates_mvp[0].strftime('%Y-%m-%d')
day_dt = datetime.strptime(day_str, "%Y-%m-%d")
domain = [3, 38.5, 6.5, 44]
varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
if len(swot_files) == 0:
return None, None
swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
adt_swot_raw = swot['adt'].flatten()
lon_swot = swot['lon'].flatten()
lat_swot = swot['lat'].flatten()
valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
lon_swot_valid = lon_swot[valid_mask]
lat_swot_valid = lat_swot[valid_mask]
adt_swot_valid = adt_swot_raw[valid_mask]
if len(lon_swot_valid) == 0:
return None, None
tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
_, indices = tree.query(points_mvp, k=1)
swot_adt_match = adt_swot_valid[indices] # nearest-neighbour values
swot_mean = np.nanmean(swot_adt_match)
swot_adt_centered = swot_adt_match - swot_mean
valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
if np.sum(valid_pts) == 0:
return None, None
mvp_vals = df_mvp['dhc'].values[valid_pts]
swot_vals = swot_adt_centered[valid_pts]
return mvp_vals, swot_vals
except Exception as e:
print(f"Error in PL{PL}: {e}")
return None, None
This section implements a systematic parameter sweep to find the optimal reference pressure level for dynamic height calculations. The ideal "level of no motion" should maximize the correlation between MVP-measured dynamic height and SWOT-observed absolute dynamic topography.
The analysis explores reference depths from 360 meters to 180 meters in 10-meter increments. For each candidate depth, the code:
Computes match-ups for all transects:
Aggregates results across transects:
Calculates correlation metrics:
Identifies optimal reference level:
This approach has several advantages over analyzing individual transects:
The Pearson correlation coefficient is an appropriate metric because:
$$r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 \sum_{i=1}^{n} (y_i - \bar{y})^2}}$$Where $x_i$ represents MVP measurements, $y_i$ represents SWOT measurements, and $\bar{x}$ and $\bar{y}$ are their respective means.
This metric quantifies the linear relationship between the two datasets, with values closer to 1 indicating stronger agreement. By selecting the reference depth that maximizes this correlation, we identify the level that produces the most consistent comparison between in-situ and satellite measurements.
## 3.1 Parameter sweep
candidate_depths = np.arange(360, 180, -10) # candidate reference levels (m)
agg_corr_values = []
for cand in candidate_depths:
agg_mvp_list = []
agg_swot_list = []
for PL in np.arange(5, 25):
mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=cand)
if mvp_vals is not None and len(mvp_vals) > 0:
agg_mvp_list.append(mvp_vals)
agg_swot_list.append(swot_vals)
if len(agg_mvp_list) == 0:
agg_corr_values.append(np.nan)
else:
agg_mvp_all = np.concatenate(agg_mvp_list)
agg_swot_all = np.concatenate(agg_swot_list)
if np.std(agg_mvp_all) < 1e-12 or np.std(agg_swot_all) < 1e-12:
agg_corr_values.append(np.nan)
else:
r, _ = pearsonr(agg_mvp_all, agg_swot_all)
agg_corr_values.append(r)
agg_corr_values = np.array(agg_corr_values)
best_index = np.nanargmax(agg_corr_values)
best_cand_agg = candidate_depths[best_index]
print(f"Best candidate depth (p_ref) is {best_cand_agg} m with Pearson r = {agg_corr_values[best_index]:.3f}")
The final step in our sensitivity analysis is to visualize how the correlation between MVP and SWOT measurements varies with different reference pressure levels. This graphical representation provides clear insight into the relationship between reference depth and measurement agreement.
The visualization consists of a line plot with markers showing the Pearson correlation coefficient for each candidate reference depth:
This plot reveals several important features about the relationship between reference depth and MVP-SWOT agreement:
From an oceanographic perspective, this analysis provides valuable insight into the vertical structure of water movement in the Northwestern Mediterranean. The optimal reference level indicates a depth at which horizontal motion is minimal or, at least, where the assumption of a "level of no motion" leads to dynamic height calculations that best match satellite observations.
By systematically investigating this parameter space, we establish a sound empirical basis for selecting the reference level to use in all subsequent analyses comparing MVP and satellite measurements, improving the overall validation methodology.
## 3.2 Visualisation
plt.figure(figsize=(8, 6))
plt.plot(candidate_depths, agg_corr_values, 'o-', markersize=8)
plt.xlabel('Candidate Level of No Motion [m]')
plt.ylabel('Aggregated Pearson r')
plt.title('Aggregated Sensitivity Analysis (up to 350 m)')
plt.grid(True)
plt.tight_layout()
plt.show()
# 1. IMPORT LIBRARIES
## 1.1 External and standard modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob, os
from datetime import datetime, timedelta
import gsw
import scipy.io as sio
import geopy.distance
from scipy.spatial import cKDTree
from scipy.stats import pearsonr
import xarray as xr
from scipy.interpolate import griddata
from tqdm import tqdm
# 2. AUXILIARY FUNCTIONS
## 2.1 Scalar and value extraction
def get_scalar(val):
"""Extract a scalar from a possibly nested iterable."""
while isinstance(val, (list, tuple, np.ndarray)):
if isinstance(val, np.ndarray):
if val.size == 1:
val = val.item()
else:
val = val[0]
else:
val = val[0]
return val
def extract_values(nested_array):
"""Flatten a nested array and return the values as a list."""
values = []
def recurse(arr):
for element in arr:
if isinstance(element, np.ndarray):
if element.dtype == 'O':
recurse(element)
else:
values.append(element.item())
elif isinstance(element, tuple):
for item in element:
recurse(item)
else:
values.append(element)
recurse(nested_array)
return values
## 2.2 Data loading utilities
def loadsshuv(fname, rep, varn, **kwargs):
"""
Load sea-surface height and velocity data from a netCDF file.
"""
if 'type' in kwargs and kwargs['type'] == 'swot':
filei2 = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei2)
if 'area' in kwargs:
area = kwargs['area']
selection = ((ds.longitude > area[0]) &
(ds.longitude < area[2]) &
(ds.latitude > area[1]) &
(ds.latitude < area[3]))
ds = ds.where(selection)
lons = ds.longitude.to_masked_array()
lats = ds.latitude.to_masked_array()
us = ds.ugos.to_masked_array()
vs = ds.vgos.to_masked_array()
ssha = ds.ssha_filtered.to_masked_array()
mss = ds.mss.to_masked_array()
mdt = ds.mdt.to_masked_array()
mask_row = np.all(np.isnan(us), axis=1)
u = us[~mask_row, :]
v = vs[~mask_row, :]
lon = lons[~mask_row, :]
lat = lats[~mask_row, :]
ssha = ssha[~mask_row, :]
mss = mss[~mask_row, :]
mdt = mdt[~mask_row, :]
adt = ssha + mdt
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
'ssha': ssha, 'adt': adt}
## 2.3 Match-up computation
def calc_matchup_for_PL(PL, candidate=None):
"""
Compute the MVP–SWOT dynamic-height match-up for a single profile.
"""
try:
# Part 1 – MVP processing and dynamic-height computation
mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
date_raw = mvp_mat['metadata']['date']
date_extract = extract_values(date_raw)
dates_mvp = []
for d_val in date_extract:
scalar_val = get_scalar(d_val)
dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
dates_mvp.append(dt)
mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
lon_mvp = mvp_mat2['lon']
lat_mvp = mvp_mat2['lat']
p_mvp = mvp_mat2['pres']
t_mvp = mvp_mat2['temp']
SP_mvp = mvp_mat2['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81
dh = []
for i in range(SA.shape[0]):
valid = ~np.isnan(SA[i])
if np.sum(valid) == 0:
dh.append(np.nan)
continue
p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate
if np.isnan(p_ref):
dh.append(np.nan)
continue
dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
p_mvp[i][valid], p_ref=p_ref)[0] / g_
if isinstance(dh_i, np.ndarray):
dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
dh.append(dh_i)
dh = np.array(dh)
if np.all(np.isnan(dh)):
return None, None
dh_mean = np.nanmean(dh)
dhc = dh - dh_mean # centred anomaly
# Part 2 – Transect construction and cumulative distance
N = len(dhc)
df_mvp_raw = pd.DataFrame({
'dh': dh,
'dhc': dhc,
'lon': lon_mvp[0][:N],
'lat': lat_mvp[0][:N]
})
dist = np.zeros(N)
for i in range(1, N):
dist[i] = dist[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
(df_mvp_raw['lat'].iloc[i], df_mvp_raw['lon'].iloc[i])
).km
df_mvp_raw['dist'] = dist
# Part 3 – Regular-grid interpolation and smoothing
step_km = 1 # horizontal sampling interval (km)
window_size_km = 11 # smoothing window (km)
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
window_size = int(window_size_km / step_km)
if window_size < 1:
window_size = 1
if window_size % 2 == 0:
window_size += 1
dh_reg_smooth = (pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
dhc_reg_smooth = (pd.Series(dhc_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
df_mvp = pd.DataFrame({
'dist': dist_reg,
'lon': lon_reg,
'lat': lat_reg,
'dh': dh_reg_smooth,
'dhc': dhc_reg_smooth
})
# Part 4 – SWOT colocation
if len(dates_mvp) == 0:
return None, None
day_str = dates_mvp[0].strftime('%Y-%m-%d')
day_dt = datetime.strptime(day_str, "%Y-%m-%d")
domain = [3, 38.5, 6.5, 44]
varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
if len(swot_files) == 0:
return None, None
swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
adt_swot_raw = swot['adt'].flatten()
lon_swot = swot['lon'].flatten()
lat_swot = swot['lat'].flatten()
valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
lon_swot_valid = lon_swot[valid_mask]
lat_swot_valid = lat_swot[valid_mask]
adt_swot_valid = adt_swot_raw[valid_mask]
if len(lon_swot_valid) == 0:
return None, None
tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
_, indices = tree.query(points_mvp, k=1)
swot_adt_match = adt_swot_valid[indices] # nearest-neighbour values
swot_mean = np.nanmean(swot_adt_match)
swot_adt_centered = swot_adt_match - swot_mean
valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
if np.sum(valid_pts) == 0:
return None, None
mvp_vals = df_mvp['dhc'].values[valid_pts]
swot_vals = swot_adt_centered[valid_pts]
return mvp_vals, swot_vals
except Exception as e:
print(f"Error in PL{PL}: {e}")
return None, None
# 3. AGGREGATED SENSITIVITY ANALYSIS
## 3.1 Parameter sweep
candidate_depths = np.arange(360, 180, -10) # candidate reference levels (m)
agg_corr_values = []
for cand in candidate_depths:
agg_mvp_list = []
agg_swot_list = []
for PL in np.arange(5, 25):
mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=cand)
if mvp_vals is not None and len(mvp_vals) > 0:
agg_mvp_list.append(mvp_vals)
agg_swot_list.append(swot_vals)
if len(agg_mvp_list) == 0:
agg_corr_values.append(np.nan)
else:
agg_mvp_all = np.concatenate(agg_mvp_list)
agg_swot_all = np.concatenate(agg_swot_list)
if np.std(agg_mvp_all) < 1e-12 or np.std(agg_swot_all) < 1e-12:
agg_corr_values.append(np.nan)
else:
r, _ = pearsonr(agg_mvp_all, agg_swot_all)
agg_corr_values.append(r)
agg_corr_values = np.array(agg_corr_values)
best_index = np.nanargmax(agg_corr_values)
best_cand_agg = candidate_depths[best_index]
print(f"Best candidate depth (p_ref) is {best_cand_agg} m with Pearson r = {agg_corr_values[best_index]:.3f}")
## 3.2 Visualisation
plt.figure(figsize=(8, 6))
plt.plot(candidate_depths, agg_corr_values, 'o-', markersize=8)
plt.xlabel('Candidate Level of No Motion [m]')
plt.ylabel('Aggregated Pearson r')
plt.title('Aggregated Sensitivity Analysis (up to 350 m)')
plt.grid(True)
plt.tight_layout()
plt.show()
Best candidate depth (p_ref) is 330 m with Pearson r = 0.618
This third section of the notebook provides a comprehensive statistical evaluation of SWOT data against in-situ measurements, with particular focus on quantifying the improvements achieved through reference level optimization.
While the previous section identified the optimal reference depth, this code implements a rigorous statistical framework to assess the practical significance of this optimization. Through scatterplots and numerical metrics, it provides a clear picture of how well SWOT captures the spatial variations in sea surface height measured by the MVP.
This statistical validation:
By comparing performance metrics between the default and optimized approaches, this analysis quantifies the practical benefits of reference level optimization. The results reveal not only whether SWOT and MVP measurements agree, but also how this agreement is influenced by methodological choices in data processing.
These statistical insights provide crucial guidance for future validation efforts and help establish best practices for comparing satellite altimetry with in-situ oceanographic measurements in the Mediterranean Sea and beyond.
The calc_matchup_for_PL function serves as the core processing pipeline for our analysis, structured into four logical components that handle the complete workflow from raw data to matched measurement pairs.
This function takes two parameters:
PL: The profile (plongée) number to processcandidate: Optional reference pressure level (in dbar) for dynamic height calculationsThe function performs comprehensive end-to-end processing to:
This modular design allows us to systematically test different reference pressure levels and evaluate their impact on MVP-SWOT agreement. The function is robust against various error conditions, with appropriate handling for missing data, invalid measurements, and processing exceptions.
By serving as the foundation for both our parameter optimization and statistical validation, this function ensures consistent processing throughout our analysis pipeline, enhancing the reliability of our results.
## 1.1 calc_matchup_for_PL
def calc_matchup_for_PL(PL, candidate=None):
This section implements the critical first step of our processing pipeline: loading MVP measurements and calculating dynamic height - the oceanographic variable most directly comparable to satellite altimetry.
The function first loads two complementary MATLAB files:
After converting timestamps from MATLAB's date format to Python datetime objects, the code processes the raw measurements using the TEOS-10 standard:
Converting to absolute properties:
Dynamic height calculation: Dynamic height represents the vertical integration of density anomalies and is calculated profile-by-profile using:
$$\Delta D = \frac{1}{g} \int_{p_{\text{ref}}}^{0} \delta(S,T,p) \, dp$$
Where:
Reference level selection:
The function implements a key feature: flexible reference level selection through the candidate parameter:
p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate
This allows testing different reference depths during our sensitivity analysis.
Centering the anomalies: Finally, the function computes centered anomalies by subtracting the mean:
dhc = dh - np.nanmean(dh)
This facilitates comparison with satellite data by removing systematic offsets.
The code includes robust error handling for missing or invalid data, ensuring that problematic profiles don't compromise the overall analysis.
### 1.1.1 MVP data loading and dynamic-height computation
try:
mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
date_raw = mvp_mat['metadata']['date']
date_extract = extract_values(date_raw)
dates_mvp = []
for d_val in date_extract:
scalar_val = get_scalar(d_val)
dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
dates_mvp.append(dt)
mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
lon_mvp = mvp_mat2['lon']
lat_mvp = mvp_mat2['lat']
p_mvp = mvp_mat2['pres']
t_mvp = mvp_mat2['temp']
SP_mvp = mvp_mat2['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81
dh = []
for i in range(SA.shape[0]):
valid = ~np.isnan(SA[i])
if np.sum(valid) == 0:
dh.append(np.nan)
continue
p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate # reference level
if np.isnan(p_ref):
dh.append(np.nan)
continue
dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
p_mvp[i][valid], p_ref=p_ref)[0] / g_
if isinstance(dh_i, np.ndarray):
dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
dh.append(dh_i)
dh = np.array(dh)
if np.all(np.isnan(dh)):
return None, None
dhc = dh - np.nanmean(dh) # centred anomaly
In this section, the code organizes the dynamic height data into a structured format and calculates along-track distances, creating a well-defined spatial reference system for the MVP transect.
The process consists of two main steps:
Data organization: The code creates a pandas DataFrame containing:
dh) dhc)This structured format facilitates subsequent processing and analysis.
Cumulative distance calculation: For each point along the transect, the code calculates the cumulative distance from the starting point:
dist[i] = dist[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
(df_mvp_raw['lat'].iloc[i], df_mvp_raw['lon'].iloc[i])
).km
The function uses geodesic distances that account for Earth's curvature, which is more accurate than simple Euclidean calculations, especially for longer transects.
The resulting distance array is added to the DataFrame, providing a natural coordinate system for the MVP measurements. This distance metric serves as the independent variable for subsequent interpolation and facilitates consistent spatial comparisons with satellite data.
By establishing this transect representation, we create a foundation for the regular-grid interpolation and smoothing operations that follow in the next section of the processing pipeline.
### 1.1.2 Transect construction and cumulative distance
N = len(dhc)
df_mvp_raw = pd.DataFrame({
'dh': dh,
'dhc': dhc,
'lon': lon_mvp[0][:N],
'lat': lat_mvp[0][:N]
})
dist = np.zeros(N)
for i in range(1, N):
dist[i] = dist[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
(df_mvp_raw['lat'].iloc[i], df_mvp_raw['lon'].iloc[i])
).km
df_mvp_raw['dist'] = dist
This section performs critical spatial preprocessing to ensure consistent sampling and noise reduction in the MVP data. The process involves two key operations:
Regular-grid interpolation: The code converts irregularly spaced measurements to a uniform 1-kilometer grid:
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
This regular spacing is essential for:
Moving-window smoothing: To reduce high-frequency noise while preserving mesoscale features, the code applies an 11-kilometer centered moving average filter:
dh_reg_smooth = (pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
The function ensures proper handling of edge effects through backward and forward filling of missing values.
The smoothing window size (11 km) represents a careful balance between:
Finally, the processed data is organized into a new DataFrame with regular spacing and smoothed values, ready for comparison with satellite observations. This preprocessing step is crucial for fair comparison between in-situ and satellite measurements, as it accounts for their different inherent spatial resolutions and sampling patterns.
### 1.1.3 Regular-grid interpolation and smoothing
step_km = 1 # horizontal sampling interval
window_size_km = 11
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
window_size = int(window_size_km / step_km)
if window_size < 1:
window_size = 1
if window_size % 2 == 0:
window_size += 1
dh_reg_smooth = (pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
dhc_reg_smooth = (pd.Series(dhc_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
df_mvp = pd.DataFrame({
'dist': dist_reg,
'lon': lon_reg,
'lat': lat_reg,
'dh': dh_reg_smooth,
'dhc': dhc_reg_smooth
})
The final part of the processing pipeline performs the critical task of finding and matching SWOT satellite observations to our processed MVP data points. This spatial and temporal alignment enables direct point-by-point comparison between the two measurement systems.
The process follows a systematic approach:
Temporal matching: The code identifies SWOT data files from the same day as the MVP measurements:
swot_files = glob.glob(f'...SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
Spatial subsetting: It loads SWOT data within the Northwestern Mediterranean domain (3°-6.5°E, 38.5°-44°N) and extracts valid measurements:
valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
Nearest-neighbor matching: Using a k-d tree spatial index, the code efficiently finds the closest SWOT observation for each MVP point:
tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
_, indices = tree.query(points_mvp, k=1)
This approach provides optimal computational performance, even with the large number of points in SWOT's wide-swath data.
Data centering and filtering: The SWOT ADT values are centered by subtracting their mean, and only valid point pairs are retained:
swot_adt_centered = adt_swot_valid[indices] - np.nanmean(adt_swot_valid[indices])
valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
The function's output consists of two matched arrays containing:
These paired values form the foundation for our correlation analysis, enabling quantitative assessment of how well the satellite measurements agree with in-situ data, and how this agreement varies with different reference pressure levels.
### 1.1.4 SWOT colocation and output
if len(dates_mvp) == 0:
return None, None
day_dt = datetime.strptime(dates_mvp[0].strftime('%Y-%m-%d'), "%Y-%m-%d")
domain = [3, 38.5, 6.5, 44]
varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
if len(swot_files) == 0:
return None, None
swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
adt_swot_raw = swot['adt'].flatten()
lon_swot = swot['lon'].flatten()
lat_swot = swot['lat'].flatten()
valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
lon_swot_valid = lon_swot[valid_mask]
lat_swot_valid = lat_swot[valid_mask]
adt_swot_valid = adt_swot_raw[valid_mask]
if len(lon_swot_valid) == 0:
return None, None
tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
_, indices = tree.query(points_mvp, k=1)
swot_adt_centered = adt_swot_valid[indices] - np.nanmean(adt_swot_valid[indices])
valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
if np.sum(valid_pts) == 0:
return None, None
return df_mvp['dhc'].values[valid_pts], swot_adt_centered[valid_pts]
except Exception as e:
print(f"Error in PL{PL}: {e}")
return None, None
This section evaluates the agreement between MVP and SWOT measurements using the default reference level approach, where dynamic height is calculated relative to the maximum pressure (deepest measurement) for each profile.
The analysis consists of three main components:
Data aggregation across all 20 transects (PL5-PL24), creating a comprehensive dataset for robust statistical analysis.
Statistical parameter calculation to quantify the agreement between in-situ and satellite measurements:
Linear regression: Fits a model $y = mx + b$ to determine how well SWOT data predicts MVP measurements
Root Mean Square Error (RMSE): Quantifies the average magnitude of discrepancies: $$\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$$
Pearson correlation coefficient: Measures the strength of the linear relationship: $$r = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2 \sum_{i=1}^{n}(y_i-\bar{y})^2}}$$
Standard deviations: Compare the variability in each dataset
Visualization through a scatterplot showing:
This baseline assessment using the traditional maximum-pressure reference level establishes a reference point for evaluating the impact of our optimized reference level. The deviation of the regression slope from 1.0 indicates systematic differences in the amplitude of measurements between the two systems, while the correlation coefficient quantifies the overall pattern agreement regardless of amplitude.
The statistical metrics from this analysis will be directly compared with those from the optimized reference level to demonstrate the practical significance of our parameter optimization approach.
## 2.1 Default reference level (p_max)
agg_mvp_def = []
agg_swot_def = []
for PL in np.arange(5, 25):
mvp_vals, swot_vals = calc_matchup_for_PL(PL)
if mvp_vals is not None and len(mvp_vals) > 0:
agg_mvp_def.append(mvp_vals)
agg_swot_def.append(swot_vals)
agg_mvp_all_def = np.concatenate(agg_mvp_def) if len(agg_mvp_def) else np.array([])
agg_swot_all_def = np.concatenate(agg_swot_def) if len(agg_swot_def) else np.array([])
if len(agg_swot_all_def):
slope_def, intercept_def = np.polyfit(agg_swot_all_def, agg_mvp_all_def, 1)
x_fit_def = np.linspace(np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def), 100)
y_fit_def = slope_def * x_fit_def + intercept_def
rmse_def = np.sqrt(np.mean((agg_mvp_all_def - agg_swot_all_def) ** 2))
r_def, _ = pearsonr(agg_mvp_all_def, agg_swot_all_def)
std_mvp_def = np.std(agg_mvp_all_def, ddof=1)
std_swot_def = np.std(agg_swot_all_def, ddof=1)
else:
slope_def = intercept_def = rmse_def = r_def = std_mvp_def = std_swot_def = np.nan
plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_def, agg_mvp_all_def, alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)],
[np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)], 'k--', label='1:1 line')
plt.plot(x_fit_def, y_fit_def, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_max)')
plt.title('Scatter Plot: p_ref = p_max')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_def = ax.table(
cellText=[['RMSE', f'{rmse_def:.3f}'],
['Pearson r', f'{r_def:.3f}'],
['Slope', f'{slope_def:.3f}'],
['Std MVP', f'{std_mvp_def:.3f}'],
['Std SWOT', f'{std_swot_def:.3f}']],
colLabels=['Statistic', 'Value'],
loc='center'
)
table_def.auto_set_font_size(False)
table_def.set_fontsize(10)
ax.set_title('p_ref = p_max')
plt.show()
This section applies our optimized reference level (best_cand_agg) determined from the previous sensitivity analysis to evaluate whether it improves the agreement between MVP and SWOT measurements. Using the optimal reference level rather than the traditional maximum pressure approach represents a key methodological advancement in satellite altimetry validation.
The analysis follows the same structure as the previous section, but now uses the optimized reference pressure for dynamic height calculations. This allows for direct comparison of results to quantify the improvement achieved through parameter optimization.
The scatterplot visualization uses green points to visually distinguish it from the default case, while maintaining the same statistical framework. The fitted regression line shows how the linear relationship between MVP and SWOT measurements changes when using the optimized reference level.
From an oceanographic perspective, the optimal reference level represents the depth at which the assumption of "no motion" produces dynamic height calculations that best match satellite observations. This has important implications for our understanding of the region's vertical structure:
If the optimal depth is significantly different from typical maximum cast depths, it suggests that substantial motion exists at deeper levels
The improvement in correlation (if any) quantifies how much the reference level choice affects our ability to validate satellite measurements
Changes in the regression slope indicate whether the optimal reference level helps align the amplitude of variations between the two measurement systems
These statistical metrics provide concrete evidence for the practical significance of our reference level optimization. By demonstrating a quantitative improvement in satellite-in situ agreement, we establish a more robust methodology for SWOT validation that can be applied to other regions and campaigns.
## 2.2 Optimal reference level (p_optimal)
agg_mvp_opt = []
agg_swot_opt = []
for PL in np.arange(5, 25):
mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=best_cand_agg)
if mvp_vals is not None and len(mvp_vals) > 0:
agg_mvp_opt.append(mvp_vals)
agg_swot_opt.append(swot_vals)
agg_mvp_all_opt = np.concatenate(agg_mvp_opt) if len(agg_mvp_opt) else np.array([])
agg_swot_all_opt = np.concatenate(agg_swot_opt) if len(agg_swot_opt) else np.array([])
if len(agg_swot_all_opt):
slope_opt, intercept_opt = np.polyfit(agg_swot_all_opt, agg_mvp_all_opt, 1)
x_fit_opt = np.linspace(np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt), 100)
y_fit_opt = slope_opt * x_fit_opt + intercept_opt
rmse_opt = np.sqrt(np.mean((agg_mvp_all_opt - agg_swot_all_opt) ** 2))
r_opt, _ = pearsonr(agg_mvp_all_opt, agg_swot_all_opt)
std_mvp_opt = np.std(agg_mvp_all_opt, ddof=1)
std_swot_opt = np.std(agg_swot_all_opt, ddof=1)
else:
slope_opt = intercept_opt = rmse_opt = r_opt = std_mvp_opt = std_swot_opt = np.nan
plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_opt, agg_mvp_all_opt, c='green', alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)],
[np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)], 'k--', label='1:1 line')
plt.plot(x_fit_opt, y_fit_opt, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_optimal)')
plt.title('Scatter Plot: p_ref = p_optimal')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_opt = ax.table(
cellText=[['RMSE', f'{rmse_opt:.3f}'],
['Pearson r', f'{r_opt:.3f}'],
['Slope', f'{slope_opt:.3f}'],
['Std MVP', f'{std_mvp_opt:.3f}'],
['Std SWOT', f'{std_swot_opt:.3f}']],
colLabels=['Statistic', 'Value'],
loc='center'
)
table_opt.auto_set_font_size(False)
table_opt.set_fontsize(10)
ax.set_title('p_ref = p_optimal')
plt.show()
# 1. FUNCTION DEFINITIONS
## 1.1 calc_matchup_for_PL
def calc_matchup_for_PL(PL, candidate=None):
# ### 1.1.1 MVP data loading and dynamic-height computation
try:
mvp_mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
date_raw = mvp_mat['metadata']['date']
date_extract = extract_values(date_raw)
dates_mvp = []
for d_val in date_extract:
scalar_val = get_scalar(d_val)
dt = datetime.fromordinal(int(float(scalar_val))) + timedelta(days=float(scalar_val) % 1) - timedelta(days=366)
dates_mvp.append(dt)
mvp_mat2 = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
lon_mvp = mvp_mat2['lon']
lat_mvp = mvp_mat2['lat']
p_mvp = mvp_mat2['pres']
t_mvp = mvp_mat2['temp']
SP_mvp = mvp_mat2['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81
dh = []
for i in range(SA.shape[0]):
valid = ~np.isnan(SA[i])
if np.sum(valid) == 0:
dh.append(np.nan)
continue
p_ref = np.nanmax(p_mvp[i][valid]) if candidate is None else candidate # reference level
if np.isnan(p_ref):
dh.append(np.nan)
continue
dh_i = gsw.geo_strf_dyn_height(SA[i][valid], CT[i][valid],
p_mvp[i][valid], p_ref=p_ref)[0] / g_
if isinstance(dh_i, np.ndarray):
dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
dh.append(dh_i)
dh = np.array(dh)
if np.all(np.isnan(dh)):
return None, None
dhc = dh - np.nanmean(dh) # centred anomaly
# ### 1.1.2 Transect construction and cumulative distance
N = len(dhc)
df_mvp_raw = pd.DataFrame({
'dh': dh,
'dhc': dhc,
'lon': lon_mvp[0][:N],
'lat': lat_mvp[0][:N]
})
dist = np.zeros(N)
for i in range(1, N):
dist[i] = dist[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].iloc[i-1], df_mvp_raw['lon'].iloc[i-1]),
(df_mvp_raw['lat'].iloc[i], df_mvp_raw['lon'].iloc[i])
).km
df_mvp_raw['dist'] = dist
# ### 1.1.3 Regular-grid interpolation and smoothing
step_km = 1 # horizontal sampling interval
window_size_km = 11
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
window_size = int(window_size_km / step_km)
if window_size < 1:
window_size = 1
if window_size % 2 == 0:
window_size += 1
dh_reg_smooth = (pd.Series(dh_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
dhc_reg_smooth = (pd.Series(dhc_reg)
.rolling(window=window_size, center=True)
.mean()
.bfill()
.ffill()
.values)
df_mvp = pd.DataFrame({
'dist': dist_reg,
'lon': lon_reg,
'lat': lat_reg,
'dh': dh_reg_smooth,
'dhc': dhc_reg_smooth
})
# ### 1.1.4 SWOT colocation and output
if len(dates_mvp) == 0:
return None, None
day_dt = datetime.strptime(dates_mvp[0].strftime('%Y-%m-%d'), "%Y-%m-%d")
domain = [3, 38.5, 6.5, 44]
varn_swot = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'adt': 'adt'}
swot_files = glob.glob(f'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/'
f'SWOT_L3_LR_SSH_Expert_*{day_dt.strftime("%Y%m%d")}*_v2.0.0.nc')
if len(swot_files) == 0:
return None, None
swot = loadsshuv(swot_files[0], '', varn_swot, type='swot', area=domain)
adt_swot_raw = swot['adt'].flatten()
lon_swot = swot['lon'].flatten()
lat_swot = swot['lat'].flatten()
valid_mask = (~np.isnan(lon_swot) & ~np.isnan(lat_swot) & ~np.isnan(adt_swot_raw))
lon_swot_valid = lon_swot[valid_mask]
lat_swot_valid = lat_swot[valid_mask]
adt_swot_valid = adt_swot_raw[valid_mask]
if len(lon_swot_valid) == 0:
return None, None
tree = cKDTree(np.c_[lon_swot_valid, lat_swot_valid])
points_mvp = np.c_[df_mvp['lon'].values, df_mvp['lat'].values]
_, indices = tree.query(points_mvp, k=1)
swot_adt_centered = adt_swot_valid[indices] - np.nanmean(adt_swot_valid[indices])
valid_pts = (~np.isnan(df_mvp['dhc'].values) & ~np.isnan(swot_adt_centered))
if np.sum(valid_pts) == 0:
return None, None
return df_mvp['dhc'].values[valid_pts], swot_adt_centered[valid_pts]
except Exception as e:
print(f"Error in PL{PL}: {e}")
return None, None
# 2. SCATTERPLOT AND STATISTICS
## 2.1 Default reference level (p_max)
agg_mvp_def = []
agg_swot_def = []
for PL in np.arange(5, 25):
mvp_vals, swot_vals = calc_matchup_for_PL(PL)
if mvp_vals is not None and len(mvp_vals) > 0:
agg_mvp_def.append(mvp_vals)
agg_swot_def.append(swot_vals)
agg_mvp_all_def = np.concatenate(agg_mvp_def) if len(agg_mvp_def) else np.array([])
agg_swot_all_def = np.concatenate(agg_swot_def) if len(agg_swot_def) else np.array([])
if len(agg_swot_all_def):
slope_def, intercept_def = np.polyfit(agg_swot_all_def, agg_mvp_all_def, 1)
x_fit_def = np.linspace(np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def), 100)
y_fit_def = slope_def * x_fit_def + intercept_def
rmse_def = np.sqrt(np.mean((agg_mvp_all_def - agg_swot_all_def) ** 2))
r_def, _ = pearsonr(agg_mvp_all_def, agg_swot_all_def)
std_mvp_def = np.std(agg_mvp_all_def, ddof=1)
std_swot_def = np.std(agg_swot_all_def, ddof=1)
else:
slope_def = intercept_def = rmse_def = r_def = std_mvp_def = std_swot_def = np.nan
plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_def, agg_mvp_all_def, alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)],
[np.nanmin(agg_swot_all_def), np.nanmax(agg_swot_all_def)], 'k--', label='1:1 line')
plt.plot(x_fit_def, y_fit_def, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_max)')
plt.title('Scatter Plot: p_ref = p_max')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_def = ax.table(
cellText=[['RMSE', f'{rmse_def:.3f}'],
['Pearson r', f'{r_def:.3f}'],
['Slope', f'{slope_def:.3f}'],
['Std MVP', f'{std_mvp_def:.3f}'],
['Std SWOT', f'{std_swot_def:.3f}']],
colLabels=['Statistic', 'Value'],
loc='center'
)
table_def.auto_set_font_size(False)
table_def.set_fontsize(10)
ax.set_title('p_ref = p_max')
plt.show()
## 2.2 Optimal reference level (p_optimal)
agg_mvp_opt = []
agg_swot_opt = []
for PL in np.arange(5, 25):
mvp_vals, swot_vals = calc_matchup_for_PL(PL, candidate=best_cand_agg)
if mvp_vals is not None and len(mvp_vals) > 0:
agg_mvp_opt.append(mvp_vals)
agg_swot_opt.append(swot_vals)
agg_mvp_all_opt = np.concatenate(agg_mvp_opt) if len(agg_mvp_opt) else np.array([])
agg_swot_all_opt = np.concatenate(agg_swot_opt) if len(agg_swot_opt) else np.array([])
if len(agg_swot_all_opt):
slope_opt, intercept_opt = np.polyfit(agg_swot_all_opt, agg_mvp_all_opt, 1)
x_fit_opt = np.linspace(np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt), 100)
y_fit_opt = slope_opt * x_fit_opt + intercept_opt
rmse_opt = np.sqrt(np.mean((agg_mvp_all_opt - agg_swot_all_opt) ** 2))
r_opt, _ = pearsonr(agg_mvp_all_opt, agg_swot_all_opt)
std_mvp_opt = np.std(agg_mvp_all_opt, ddof=1)
std_swot_opt = np.std(agg_swot_all_opt, ddof=1)
else:
slope_opt = intercept_opt = rmse_opt = r_opt = std_mvp_opt = std_swot_opt = np.nan
plt.figure(figsize=(6, 6))
plt.scatter(agg_swot_all_opt, agg_mvp_all_opt, c='green', alpha=0.5, label='Data points')
plt.plot([np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)],
[np.nanmin(agg_swot_all_opt), np.nanmax(agg_swot_all_opt)], 'k--', label='1:1 line')
plt.plot(x_fit_opt, y_fit_opt, 'r-', label='Regression line')
plt.xlabel('ADT SWOT (centred)')
plt.ylabel('ADT MVP (centred, p_optimal)')
plt.title('Scatter Plot: p_ref = p_optimal')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(6, 2))
ax.axis('tight')
ax.axis('off')
table_opt = ax.table(
cellText=[['RMSE', f'{rmse_opt:.3f}'],
['Pearson r', f'{r_opt:.3f}'],
['Slope', f'{slope_opt:.3f}'],
['Std MVP', f'{std_mvp_opt:.3f}'],
['Std SWOT', f'{std_swot_opt:.3f}']],
colLabels=['Statistic', 'Value'],
loc='center'
)
table_opt.auto_set_font_size(False)
table_opt.set_fontsize(10)
ax.set_title('p_ref = p_optimal')
plt.show()
This fourth section of the notebook implements a sophisticated spectral analysis framework to evaluate SWOT data across different spatial scales. By examining power spectral density and coherence, it provides unique insight into SWOT's ability to resolve oceanographic features of different sizes.
Ocean dynamics operate across a continuum of spatial scales, from basin-wide circulations to small-scale turbulence. Understanding how well satellite measurements capture this multi-scale variability is essential for oceanographic applications. This code implements a comprehensive spectral analysis approach that goes beyond simple point-to-point comparisons.
This spectral framework:
By comparing spectral characteristics across three measurement systems (MVP, SWOT, and DUACS), this analysis reveals SWOT's strengths and limitations in capturing different scales of oceanographic variability. The coherence analysis quantifies the scale-dependent agreement between satellite and in-situ measurements, providing a rigorous assessment of SWOT's effective resolution.
These spectral insights are particularly valuable for understanding SWOT's performance in the mesoscale-to-submesoscale transition region, where traditional altimetry has struggled to resolve important oceanographic features that play crucial roles in energy transfer and biogeochemical processes.
This section imports the essential libraries needed for our spectral analysis workflow:
The code also disables RuntimeWarnings, which commonly occur during spectral calculations with potentially NaN-containing data. This suppression helps maintain clean output while processing multiple transects.
This spectral analysis pipeline builds on our previous validation frameworks, but now focuses specifically on frequency-domain characteristics of the data. By examining how energy is distributed across different spatial scales, we can better understand the strengths and limitations of SWOT in capturing ocean variability.
## 1.1 Core scientific stack
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.signal as signal
import warnings
import gsw
import scipy.io as sio
import geopy.distance
from datetime import datetime, timedelta
import glob, os
from scipy.interpolate import griddata
from tqdm import tqdm
import xarray as xr
warnings.filterwarnings("ignore", category=RuntimeWarning)
The configuration section defines key threshold values that control the spectral analysis process. These parameters help ensure the statistical robustness of our results by filtering out segments that are too short to yield reliable spectral estimates.
MIN_SEGMENT_LENGTH = 10 establishes the minimum number of consecutive valid data points required to retain a segment for spectral analysis. This threshold is important for several reasons:
Statistical significance: Reliable spectral estimation requires a sufficient number of samples. Segments that are too short would produce noisy, unreliable spectra that could skew our ensemble results.
Frequency resolution: The minimum resolvable frequency is inversely proportional to the segment length. With 1-km sampling, a 10-point segment allows us to resolve wavelengths up to 10 km, which is appropriate for capturing submesoscale features.
Computational efficiency: Discarding very short segments that would contribute minimal information improves processing efficiency.
This parameter directly affects the segmentation process that occurs later in the code, where continuous portions of data are identified between gaps (NaN values) and evaluated for inclusion in the spectral analysis. By requiring at least 10 valid points, we ensure that each segment can meaningfully contribute to our understanding of the spectral characteristics at scales relevant to SWOT validation.
# 2. CONFIGURATION PARAMETERS
MIN_SEGMENT_LENGTH = 10 # minimum points to retain a segment
The read_mat_datetime function solves a critical interoperability challenge: converting MATLAB's date representation to Python's datetime format. This conversion is essential for temporal alignment between MVP measurements and satellite passes.
MATLAB represents dates as days since January 0, 0000, while Python's datetime uses a different epoch. This discrepancy requires careful handling, especially for oceanographic data where precise temporal matching is crucial.
The function implements a robust approach to extract datetime information from potentially nested MATLAB structures:
The implementation is particularly robust against common issues in MATLAB-Python conversion:
This utility function ensures accurate temporal alignment between in-situ measurements and satellite observations, which is fundamental for meaningful spectral comparison. Without precise temporal matching, we could inadvertently compare measurements representing different oceanographic conditions, leading to erroneous conclusions about SWOT's performance.
## 3.1 MATLAB ordinal‐date conversion
def read_mat_datetime(date_array):
"""
Convert MATLAB ordinal dates in 'date_array' to Python datetime objects.
"""
dates_list = []
array_flat = date_array.reshape(-1)
for val_ in array_flat:
vv = val_
while isinstance(vv, (list, tuple, np.ndarray)):
if isinstance(vv, np.ndarray):
vv = vv.item() if vv.size == 1 else vv.ravel()[0]
elif isinstance(vv, tuple):
vv = vv[0] if vv else None
if isinstance(vv, (int, float)):
frac = vv % 1
try:
dt_ = datetime.fromordinal(int(vv)) + timedelta(days=frac) - timedelta(days=366)
dates_list.append(dt_)
except:
pass
return dates_list
The loadsshuv function provides a unified interface for loading satellite altimetry data from various sources, handling the specific format differences between SWOT and traditional altimetry products.
This utility addresses a common challenge in multi-sensor validation: different satellite data products follow different conventions, variable names, and storage formats. By abstracting these differences into a single function, we ensure consistent processing across all data sources.
The function processes two main data types:
SWOT data (kwargs['type'] == 'swot'):
Standard altimetry products (e.g., DUACS):
varn dictionaryFor both data types, the function returns a standardized dictionary containing key oceanographic variables:
lon, lat)ssh, ssha)u, v)adt)This consistency in output format is critical for spectral analysis, as it ensures that we're comparing equivalent quantities across different measurement platforms, facilitating direct comparison of their spectral characteristics.
## 4.1 Generic SSH/U/V reader
def loadsshuv(fname, rep, varn, **kwargs):
"""
Load longitude, latitude, SSH, velocity, and ADT fields from a NetCDF file.
"""
if 'type' in kwargs and kwargs['type'] == 'swot':
filei2 = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei2)
if 'area' in kwargs:
area = kwargs['area']
selection = ((ds.longitude > area[0]) &
(ds.longitude < area[2]) &
(ds.latitude > area[1]) &
(ds.latitude < area[3]))
ds = ds.where(selection)
lons = ds.longitude.to_masked_array()
lats = ds.latitude.to_masked_array()
us = ds.ugos.to_masked_array()
vs = ds.vgos.to_masked_array()
ssha = ds.ssha_filtered.to_masked_array()
mss = ds.mss.to_masked_array()
mdt = ds.mdt.to_masked_array()
mask_row = np.all(np.isnan(us), axis=1)
u = us[~mask_row, :]
v = vs[~mask_row, :]
lon = lons[~mask_row, :]
lat = lats[~mask_row, :]
ssha = ssha[~mask_row, :]
mss = mss[~mask_row, :]
mdt = mdt[~mask_row, :]
adt = ssha + mdt
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
'ssha': ssha, 'adt': adt}
This function addresses one of the core challenges in satellite validation: matching SWOT observations with MVP in-situ measurements in both space and time. Proper colocation is essential for meaningful spectral comparison, as we need to ensure that both measurement systems are observing the same oceanographic features.
The load_SWOT function performs this matching through the following steps:
Geographic domain determination:
domain = [np.floor(np.min(mvp['lon'])) - 0.5,
np.floor(np.min(mvp['lat'])) - 0.5,
np.ceil(np.max(mvp['lon'])) + 0.5,
np.ceil(np.max(mvp['lat'])) + 0.5]
This creates a rectangular bounding box around the MVP transect with a 0.5° buffer in all directions, ensuring we capture all relevant SWOT data while minimizing computational load.
Temporal matching: The function identifies SWOT files corresponding to the specified date, ensuring temporal agreement between the datasets.
Spatial interpolation:
Using scipy.interpolate.griddata, the function interpolates SWOT ADT values onto the exact MVP sampling locations:
adtc = griddata(points, swot['adt'].flatten(), (mvp['lon'], mvp['lat']), method='linear')
This spatial matching is particularly important for spectral analysis because:
The resulting interpolated SWOT ADT values are directly comparable to MVP dynamic height measurements, enabling the subsequent spectral analysis to evaluate how well SWOT captures oceanographic variability across different spatial scales.
## 4.2 SWOT colocation
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
"""
Load SWOT data and interpolate ADT onto MVP points.
"""
domain = [np.floor(np.min(mvp['lon'])) - 0.5,
np.floor(np.min(mvp['lat'])) - 0.5,
np.ceil(np.max(mvp['lon'])) + 0.5,
np.ceil(np.max(mvp['lat'])) + 0.5]
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
dayv_dt.strftime('%Y%m%d') + '*_*.nc')
if not fname:
print(f"No SWOT file found for date {dayv}")
return None
swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
points = np.column_stack((swot['lon'].flatten(), swot['lat'].flatten()))
adtc = griddata(points, swot['adt'].flatten(), (mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtc)
The load_DUACS function complements the SWOT colocation by performing similar spatial-temporal matching for DUACS (Data Unification and Altimeter Combination System) altimetry products. DUACS represents the current operational standard for multi-mission satellite altimetry, making it an important reference point for evaluating SWOT's performance.
Like its SWOT counterpart, this function performs three key operations:
Temporal matching:
The function identifies DUACS files corresponding to the specified date using their standard naming convention: nrt_europe_allsat_phy_l4_YYYYMMDD_*.nc, where "nrt" stands for Near-Real-Time products.
Data loading: DUACS data is loaded with appropriate variable mapping, noting that DUACS uses different variable names than SWOT (e.g., 'sla' instead of 'ssha_filtered').
Spatial interpolation: The function handles DUACS' regular latitude-longitude grid by first creating a meshgrid before interpolation:
X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
Then it performs linear interpolation to the MVP points similarly to the SWOT colocation.
Including DUACS in our analysis serves several important purposes:
By colocating both SWOT and DUACS data onto the same MVP sampling points, we create a three-way comparison framework that will reveal how the spectral characteristics differ across measurement platforms.
## 4.3 DUACS colocation
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
"""
Load DUACS data and interpolate ADT onto MVP points.
"""
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
dayv_dt.strftime('%Y%m%d') + '_*.nc')
if not fname:
print(f"No DUACS file found for date {dayv}")
return None
duacs = loadsshuv(fname[0], '', varn)
X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
points = np.column_stack((X.flatten(), Y.flatten()))
adtd = griddata(points, duacs['adt'].values.flatten(), (mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtd)
The split_on_nan function addresses a critical challenge in oceanographic spectral analysis: handling missing or invalid data. Unlike controlled laboratory settings, oceanographic measurements often contain gaps or regions of invalid data due to sensor limitations, atmospheric interference, or coastal contamination.
This function implements a sophisticated algorithm to identify continuous valid segments within a potentially fragmented dataset:
max_consecutive_nans or more NaNs is detected, it marks the end of the current segmentThe tolerance parameter max_consecutive_nans (default = 4) provides flexibility in handling isolated missing values:
This segmentation approach is particularly important for spectral analysis because Fourier-based methods require continuous data without gaps. By identifying valid segments, we can compute accurate spectral estimates for each segment independently and then combine them to form ensemble averages.
The resulting segments form the foundation for our subsequent spectral calculations, ensuring that the power spectral density estimates are based on statistically sound data sequences without the contamination that would result from improperly handled missing values.
### 5.1.1 NaN‐aware splitting
def split_on_nan(data_array, max_consecutive_nans=4):
"""
Split a 1-D array into continuous segments separated by ≥ max_consecutive_nans NaNs.
"""
segments = []
n = len(data_array)
start_idx = 0
nan_count = 0
for i in range(n):
nan_count = nan_count + 1 if np.isnan(data_array[i]) else 0
if nan_count >= max_consecutive_nans:
end_idx = i - max_consecutive_nans
if end_idx > start_idx:
segm = data_array[start_idx:end_idx]
if len(segm) > 1:
segments.append(segm)
start_idx = i
nan_count = 0
if start_idx < n:
segm = data_array[start_idx:]
if len(segm) > 1:
segments.append(segm)
return segments
The welch_psd_1D function implements Welch's method for power spectral density (PSD) estimation, a fundamental technique in spectral analysis that improves upon the basic periodogram by reducing noise while maintaining good frequency resolution.
Welch's method, developed by Peter Welch in 1967, involves:
Mathematically, the Welch PSD estimate is given by:
$$P_{xx}^{Welch}(f) = \frac{1}{K} \sum_{i=0}^{K-1} \left| \frac{1}{L} \sum_{n=0}^{L-1} w(n) x_i(n) e^{-j2\pi fn} \right|^2$$Where:
The implementation in our function includes several key parameters:
nperseg = min(int(n // 2), 256): Sets the length of each segment, balancing between frequency resolution (larger values) and estimate variance (smaller values)noverlap = nperseg // 2: Sets 50% overlap between consecutive segments, which helps recover information lost at the edges due to windowingwindow = 'hann': Uses the Hann window, which offers a good compromise between spectral leakage reduction and frequency resolutiondetrend = 'constant': Removes the mean from each segment to prevent spectral leakage at low frequenciesThe function returns:
freq: An array of frequency points in cycles per sample unitpsd: The corresponding power spectral density estimatesThe minimum length requirement of 8 points ensures sufficient data for meaningful FFT computation, preventing potential issues with very short segments that might arise from the segmentation process.
This implementation of Welch's method provides a robust foundation for analyzing the spectral characteristics of our oceanographic data, enabling the identification of dominant spatial scales and the comparison of spectral signatures across different measurement platforms.
### 5.1.2 Welch PSD
def welch_psd_1D(data_array, fs=1.0):
"""
Compute 1-D PSD via Welch’s method (Hann window).
"""
n = len(data_array)
if n < 8:
return None, None
nperseg = min(int(n // 2), 256)
freq, psd = signal.welch(data_array, fs=fs, nperseg=nperseg,
noverlap=nperseg // 2, window='hann',
detrend='constant')
return freq, psd
The compute_segments_psd function extends our spectral analysis capability to handle real-world oceanographic data with gaps. It integrates the NaN-aware segmentation with Welch's PSD estimation to generate spectral estimates for multiple continuous segments within a potentially fragmented dataset.
This function operates through a multi-step process:
Segmentation: First, it calls split_on_nan to divide the input array into continuous segments separated by runs of NaN values:
segments = split_on_nan(data_array, max_nan)
Where max_nan (default: 4) determines the tolerance for isolated missing values.
Filtering: For each segment, it performs additional cleaning and validation:
seg_clean = seg[~np.isnan(seg)]
if len(seg_clean) < min_length:
continue
This ensures that any remaining NaN values are removed, and segments shorter than min_length (default: 10) are discarded as they would not provide statistically reliable spectral estimates.
PSD Computation: For each valid segment, it computes the power spectral density using Welch's method:
f, p = welch_psd_1D(seg_clean, fs)
Where fs is the sampling frequency (default: 1.0, appropriate for spatial data with 1 km sampling).
Result Collection: Valid frequency-PSD pairs are collected for subsequent ensemble averaging:
results.append((f, p))
This approach has several advantages for oceanographic spectral analysis:
The parameter min_length=10 is particularly important as it ensures that each segment contains sufficient points for meaningful spectral estimation. With 1 km sampling, this corresponds to a minimum spatial extent of 10 km, which is appropriate for resolving submesoscale features while ensuring statistical robustness.
This segmented approach is essential for analyzing MVP and satellite altimetry data along transects that may contain data gaps due to sensor issues, land contamination, or other operational constraints.
### 5.1.3 PSD for segmented data
def compute_segments_psd(data_array, max_nan=4, fs=1.0, min_length=10):
"""
Compute Welch PSD for each sufficiently long segment of 'data_array'.
"""
segments = split_on_nan(data_array, max_nan)
results = []
for seg in segments:
seg_clean = seg[~np.isnan(seg)]
if len(seg_clean) < min_length:
continue
f, p = welch_psd_1D(seg_clean, fs)
if f is not None and p is not None:
results.append((f, p))
return results
The loglog_regression function quantifies a fundamental characteristic of oceanographic spectra: their power-law scaling behavior across different spatial scales. This scaling is crucial for understanding the energy cascade in ocean dynamics and validating satellite observations against theoretical predictions.
Turbulent flows in geophysical fluid dynamics often exhibit power-law behavior in their energy spectra, described by:
$$E(k) \propto k^m$$Where:
In log-log space, this power-law relationship becomes linear:
$$\log(E(k)) = m\log(k) + c$$The function implements this log-log regression by:
Filtering valid data points: It selects only points where both frequency and PSD are positive non-NaN values:
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0) & (freq > 0)
This filtering is necessary because logarithms are only defined for positive values.
Checking data sufficiency: It requires at least two valid points to perform a meaningful regression:
if np.sum(valid) < 2:
return np.nan, np.nan
Performing linear regression in log-log space:
m, c = np.polyfit(np.log10(freq[valid]), np.log10(psd[valid]), 1)
This returns the slope m and intercept c of the best-fit line in log-log space.
In oceanographic contexts, the spectral slope has specific physical interpretations:
By quantifying these slopes for different datasets (MVP, SWOT, DUACS), we can assess:
This analysis helps validate whether SWOT accurately reproduces the physical scaling laws observed in direct oceanographic measurements and expected from theoretical models.
### 5.1.4 Log–log regression of PSD
def loglog_regression(freq, psd):
"""
Fit log10(PSD) = m log10(freq) + c and return slope m, intercept c.
"""
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0) & (freq > 0)
if np.sum(valid) < 2:
return np.nan, np.nan
m, c = np.polyfit(np.log10(freq[valid]), np.log10(psd[valid]), 1)
return m, c
The split_on_nan_pair function extends our segmentation approach to handle paired datasets, which is essential for coherence analysis between MVP and satellite measurements. Unlike the single-array version, this function identifies continuous segments where both datasets have valid values simultaneously.
This synchronized segmentation is critical because coherence analysis requires corresponding measurements from both sources. The function introduces several key innovations:
Combined NaN masking: The function creates a unified mask that identifies points where data is missing in either array:
combined_nan = np.isnan(data1) | np.isnan(data2)
This ensures that segments only include points where both datasets have valid measurements.
Synchronized extraction: When a valid segment is identified, the function extracts corresponding portions from both input arrays and stores them as pairs:
segments.append((segm1, segm2))
Consistent gap tolerance:
The max_consecutive_nans parameter (default: 4) applies the same tolerance for gaps to both datasets, maintaining a consistent approach to segmentation.
This paired segmentation strategy has important implications for our coherence analysis:
By enforcing this strict paired segmentation, we create a solid foundation for robust coherence analysis that accurately reflects the relationship between in-situ and satellite measurements across different spatial scales.
### 5.2.1 Paired NaN‐aware splitting
def split_on_nan_pair(data1, data2, max_consecutive_nans=4):
"""
Split paired arrays into segments, honouring NaNs in either input.
"""
data1 = np.asarray(data1)
data2 = np.asarray(data2)
combined_nan = np.isnan(data1) | np.isnan(data2)
segments = []
n = len(data1)
start_idx = 0
nan_count = 0
for i in range(n):
nan_count = nan_count + 1 if combined_nan[i] else 0
if nan_count >= max_consecutive_nans:
end_idx = i - max_consecutive_nans
if end_idx > start_idx:
segm1 = data1[start_idx:end_idx]
segm2 = data2[start_idx:end_idx]
if len(segm1) > 1:
segments.append((segm1, segm2))
start_idx = i
nan_count = 0
if start_idx < n:
segm1 = data1[start_idx:]
segm2 = data2[start_idx:]
if len(segm1) > 1:
segments.append((segm1, segm2))
return segments
The welch_coherence_1D function implements the calculation of spectral coherence between two signals using Welch's method, providing a frequency-dependent measure of their linear relationship. Spectral coherence is a critical metric for validating SWOT data, as it quantifies how well the satellite captures the spatial patterns observed in MVP measurements across different scales.
Mathematically, the magnitude-squared coherence between two signals $x(t)$ and $y(t)$ is defined as:
$$C_{xy}(f) = \frac{|P_{xy}(f)|^2}{P_{xx}(f)P_{yy}(f)}$$Where:
The coherence value ranges from 0 to 1, where:
The function utilizes SciPy's signal.coherence implementation, which applies Welch's method for consistent and robust coherence estimation. Key parameters include:
nperseg: Sets the length of each segment, balancing frequency resolution against statistical confidencewindow='hann': Applies a Hann window to reduce spectral leakagenoverlap=nperseg//2: Uses 50% overlap between segments to ensure smooth transitionsdetrend='constant': Removes the mean from each segment to eliminate DC offset effectsThe minimum signal length requirement of 8 points ensures statistical reliability of the coherence estimate.
In the context of SWOT validation, coherence analysis provides several key insights:
This coherence analysis is particularly valuable for understanding the effective resolution of SWOT measurements and identifying scales where additional calibration or improved processing might enhance agreement with in-situ observations.
### 5.2.2 Welch coherence
def welch_coherence_1D(signal1, signal2, fs=1.0):
"""
Compute Welch spectral coherence between two signals.
"""
n = len(signal1)
if n < 8:
return None, None
nperseg = min(int(n // 2), 256)
freq, coh = signal.coherence(signal1, signal2, fs=fs, window='hann',
nperseg=nperseg, noverlap=nperseg // 2,
detrend='constant')
return freq, coh
The compute_segments_coherence function extends our coherence analysis capability to handle real-world oceanographic data with gaps and discontinuities. It integrates the paired segmentation approach with Welch's coherence estimation to generate coherence estimates for multiple continuous segments where both datasets have valid measurements.
This function follows a structured approach:
Paired segmentation: It uses split_on_nan_pair to identify continuous segments where both signals have sufficient valid data:
segments = split_on_nan_pair(data1, data2, max_nan)
Additional cleaning: For each segment pair, it applies a final filtering step to remove any remaining NaN values:
mask_valid = (~np.isnan(seg1)) & (~np.isnan(seg2))
seg1_clean = seg1[mask_valid]
seg2_clean = seg2[mask_valid]
Length validation: It enforces a minimum segment length to ensure statistical reliability:
if len(seg1_clean) < min_length:
continue
Coherence computation: For each valid segment pair, it calculates the coherence spectrum using Welch's method:
f, co = welch_coherence_1D(seg1_clean, seg2_clean, fs=fs)
Result collection: Valid frequency-coherence pairs are collected for subsequent ensemble averaging.
This segmented approach to coherence estimation provides several advantages:
The parameter min_length=10 ensures that coherence estimates are based on sufficiently long segments to provide meaningful information about the scale-dependent relationship between in-situ and satellite measurements.
In our validation framework, this function enables us to quantify how well SWOT captures the spatial patterns observed by MVP at different scales, providing critical insight into the satellite's effective resolution and accuracy.
### 5.2.3 Coherence for segmented data
def compute_segments_coherence(data1, data2, max_nan=4, fs=1.0, min_length=10):
"""
Compute Welch coherence for each paired segment of (data1, data2).
"""
segments = split_on_nan_pair(data1, data2, max_nan)
results = []
for seg1, seg2 in segments:
mask_valid = (~np.isnan(seg1)) & (~np.isnan(seg2))
seg1_clean = seg1[mask_valid]
seg2_clean = seg2[mask_valid]
if len(seg1_clean) < min_length:
continue
f, co = welch_coherence_1D(seg1_clean, seg2_clean, fs=fs)
if f is not None and co is not None:
results.append((f, co))
return results
This section sets up the data structures needed to accumulate spectral results across multiple MVP transects. By analyzing all available transects together, we can generate statistically robust ensemble spectra that capture the general characteristics of the region while minimizing the influence of transect-specific anomalies.
The code initializes two types of containers:
Power Spectral Density (PSD) containers:
global_freq_mvp, global_psd_mvp: For in-situ MVP measurementsglobal_freq_swot, global_psd_swot: For SWOT satellite dataglobal_freq_duacs, global_psd_duacs: For DUACS multi-mission altimetrySpectral Coherence containers:
global_freq_coh_mvp_swot, global_coh_mvp_swot: For MVP-SWOT coherenceglobal_freq_coh_mvp_duacs, global_coh_mvp_duacs: For MVP-DUACS coherenceglobal_freq_coh_swot_duacs, global_coh_swot_duacs: For SWOT-DUACS coherenceThese global accumulators will store the spectral results from each transect (PL5 through PL24), which will later be combined through ensemble averaging to produce our final spectral estimates.
The approach of processing multiple transects (20 in total) provides several advantages:
Statistical robustness: By combining data from many independent transects, we reduce the influence of random variability and measurement errors.
Spatial coverage: The transects collectively sample a broader area of the Northwestern Mediterranean, capturing a more representative picture of the region's dynamics.
Feature diversity: Different transects may intersect various oceanographic features (eddies, fronts, etc.), providing a comprehensive view of how well SWOT captures different types of structures.
This ensemble approach is particularly important for spectral analysis, where averaging across multiple independent estimates significantly improves the statistical confidence in the resulting spectra and coherence functions.
## 6.1 Initialisation of accumulators
PLs = np.arange(5, 25) # profiling lines 5 – 24
# Power-spectral-density (PSD) containers
global_freq_mvp, global_psd_mvp = [], []
global_freq_swot, global_psd_swot = [], []
global_freq_duacs, global_psd_duacs = [], []
# Spectral-coherence containers
global_freq_coh_mvp_swot, global_coh_mvp_swot = [], []
global_freq_coh_mvp_duacs, global_coh_mvp_duacs = [], []
global_freq_coh_swot_duacs, global_coh_swot_duacs = [], []
This section implements the core processing loop that systematically analyzes each MVP transect (PL5 through PL24), computing spectral characteristics and accumulating results for ensemble analysis.
The first step in processing each transect is loading the MVP data and computing dynamic height fields. This process follows established oceanographic principles:
Data loading: The code loads two complementary MATLAB files:
Date extraction: The function converts MATLAB dates to Python datetime objects using the previously defined utility function.
Thermodynamic calculations: Following the TEOS-10 standard, the code:
Dynamic height calculation: For each valid profile in the transect, the code:
Anomaly calculation: Finally, the code computes centered anomalies by subtracting the mean:
dhc = dh_array - np.nanmean(dh_array)
This centered dynamic height anomaly (dhc) is particularly important for spectral analysis as it:
The implementation includes robust error handling to skip transects with invalid dates or where all dynamic height values are NaN, ensuring that only viable data contributes to our spectral analysis.
## 6.2 Iterative processing
for PL in PLs:
print(f"\n[SPECTRAL ANALYSIS] – PL{PL} in progress...")
try:
# ### 6.2.1 MVP data loading and dynamic-height computation
mat_file = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
date_array = mat_file['metadata']['date']
dates_mvp = read_mat_datetime(date_array)
if len(dates_mvp) == 0:
raise ValueError(f"No valid date in PL{PL}")
mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
lon_mvp = mat['lon']
lat_mvp = mat['lat']
p_mvp = mat['pres']
t_mvp = mat['temp']
SP_mvp = mat['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81
dh_list = []
for i in range(SA.shape[0]):
valid_ = ~np.isnan(SA[i])
if np.sum(valid_) == 0:
dh_list.append(np.nan)
continue
p_ref = np.nanmax(p_mvp[i][valid_]) # p_ref = p_max
if np.isnan(p_ref):
dh_list.append(np.nan)
continue
dh_i = gsw.geo_strf_dyn_height(SA[i][valid_], CT[i][valid_],
p_mvp[i][valid_], p_ref=p_ref)[0] / g_
if isinstance(dh_i, np.ndarray):
dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
dh_list.append(dh_i)
dh_array = np.array(dh_list, dtype=float)
if np.all(np.isnan(dh_array)):
raise ValueError(f"PL{PL}: All dh are NaN")
dhc = dh_array - np.nanmean(dh_array) # centred anomaly
After computing dynamic height, the code organizes the MVP data into a structured format and calculates the cumulative distance along the transect. This spatial referencing is essential for spectral analysis, as it establishes the physical distance over which variations occur.
The process involves two main steps:
Data organization: The code creates a pandas DataFrame containing:
dh)dhc)This structured format facilitates subsequent processing and analysis while ensuring that the different variables remain properly aligned.
Cumulative distance calculation: For each point along the transect, the code calculates the geodesic distance from the starting point:
dist_along[i] = dist_along[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].values[i-1], df_mvp_raw['lon'].values[i-1]),
(df_mvp_raw['lat'].values[i], df_mvp_raw['lon'].values[i])
).km
This calculation uses geopy's geodesic distance function, which accounts for Earth's curvature to provide accurate distances even for longer transects.
The resulting distance metric serves as the independent variable for our spectral analysis, representing the spatial dimension along which we'll analyze variations in dynamic height. This is particularly important because:
By converting from geographic coordinates to distance along track, we transform the data into a form suitable for one-dimensional spectral analysis, enabling the identification of characteristic spatial scales in the oceanographic features observed along the transect.
### 6.2.2 MVP transect dataframe and distance along track
n_profiles = SA.shape[0]
if lon_mvp.shape[1] < n_profiles:
raise ValueError(f"PL{PL}: lon_mvp dimension mismatch")
df_mvp_raw = pd.DataFrame({
'dh': dh_array,
'dhc': dhc,
'lon': lon_mvp[0, :n_profiles],
'lat': lat_mvp[0, :n_profiles]
})
dist_along = np.zeros(n_profiles)
for i in range(1, n_profiles):
dist_along[i] = dist_along[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].values[i-1], df_mvp_raw['lon'].values[i-1]),
(df_mvp_raw['lat'].values[i], df_mvp_raw['lon'].values[i])
).km
df_mvp_raw['dist'] = dist_along
This step transforms the irregularly spaced MVP measurements into a regular grid with uniform 1 km spacing. Unlike in previous analyses, no smoothing is applied here, preserving the high-frequency content essential for spectral analysis.
The resampling process involves:
Regular distance grid creation: The code creates a uniform distance grid from 0 to the maximum transect length in 1 km increments:
dist_reg = np.arange(0, dist_max + step_km, step_km)
Linear interpolation: All variables (longitude, latitude, dynamic height) are interpolated onto this regular grid using simple linear interpolation:
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
Dataframe creation: The interpolated values are organized into a new DataFrame with regular spacing.
This regular resampling is crucial for spectral analysis for several reasons:
Importantly, this step applies only interpolation without smoothing. This contrasts with our previous validation analyses where smoothing helped reduce noise. For spectral analysis, we deliberately preserve the high-frequency content because:
With this regular sampling, we've prepared the MVP data for spectral analysis while preserving its full frequency content. This will allow us to accurately assess how energy is distributed across different spatial scales in the original measurements.
### 6.2.3 Regular 1 km re-sampling (no smoothing)
step_km = 1
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
df_mvp = pd.DataFrame({
'dist': dist_reg,
'lon': lon_reg,
'lat': lat_reg,
'dh': dh_reg,
'dhc': dhc_reg
})
This section matches satellite altimetry data from both SWOT and DUACS to the MVP transect points, enabling direct spectral comparison between in-situ and satellite measurements. The colocation process ensures that all three datasets sample the same oceanographic features along the transect.
For each transect, the code:
load_SWOT functionload_DUACS functionThe centering operation is particularly important for spectral analysis as it:
The code includes robust error handling with fallback to NaN arrays if satellite data is unavailable for the transect date. This ensures that missing satellite coverage doesn't halt the processing of valid MVP data.
This three-way colocation creates a powerful validation framework where we can directly compare:
By analyzing the spectral characteristics of these colocated measurements, we can assess how well each satellite product captures the spatial variations observed in the in-situ data across different scales, with particular focus on SWOT's performance in resolving submesoscale features.
### 6.2.4 External ADT colocation (SWOT & DUACS)
dayv_str = dates_mvp[0].strftime('%Y-%m-%d')
adt_swot = load_SWOT(df_mvp, dayv_str,
'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/', '')
adt_swot = np.full(len(df_mvp), np.nan) if adt_swot is None else adt_swot - np.nanmean(adt_swot)
adt_duacs = load_DUACS(df_mvp, dayv_str,
'/home/bessiere/Documents/DUACS/', '')
adt_duacs = np.full(len(df_mvp), np.nan) if adt_duacs is None else adt_duacs - np.nanmean(adt_duacs)
With all three datasets prepared and aligned, this step computes the power spectral density (PSD) for each valid segment along the transect. This reveals how energy is distributed across different spatial scales in the MVP, SWOT, and DUACS measurements.
For each dataset, the code:
compute_segments_psd to identify continuous segments and calculate their PSDsUses consistent parameters across all datasets:
max_nan=4: Tolerates up to 4 consecutive missing valuesfs=1.0: Sets sampling frequency to 1 sample per kmmin_length=MIN_SEGMENT_LENGTH: Ensures segments have at least 10 valid pointsStores the resulting frequency-PSD pairs in the global accumulators initialized earlier
This segmented approach to spectral estimation is particularly valuable for oceanographic data because:
The frequency range of our analysis is determined by the sampling parameters:
By accumulating spectral estimates from all transects, we build a comprehensive picture of how energy is distributed across spatial scales in each measurement system. This will reveal important characteristics such as:
These spectral characteristics are fundamental to understanding the oceanographic processes captured by each system and evaluating SWOT's ability to resolve fine-scale features.
### 6.2.5 PSD computation and storage
segments_psd_mvp = compute_segments_psd(df_mvp['dhc'].values, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, p in segments_psd_mvp:
global_freq_mvp.append(f)
global_psd_mvp.append(p)
segments_psd_swot = compute_segments_psd(adt_swot, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, p in segments_psd_swot:
global_freq_swot.append(f)
global_psd_swot.append(p)
segments_psd_duacs = compute_segments_psd(adt_duacs, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, p in segments_psd_duacs:
global_freq_duacs.append(f)
global_psd_duacs.append(p)
This final processing step computes the spectral coherence between each pair of datasets, providing a frequency-dependent measure of their linear relationship. Coherence analysis is particularly valuable as it reveals at which spatial scales the different measurement systems capture the same oceanographic signals.
The code computes three distinct coherence relationships:
For each pair, the code:
compute_segments_coherence to identify valid segment pairs and calculate their coherence spectraMaintains consistent parameters across all comparisons:
max_nan=4: Tolerates up to 4 consecutive missing valuesfs=1.0: Sets sampling frequency to 1 sample per kmmin_length=MIN_SEGMENT_LENGTH: Ensures segments have at least 10 valid pointsStores the resulting frequency-coherence pairs in the global accumulators
Coherence values range from 0 to 1, with interpretation:
This comprehensive coherence analysis framework enables several key evaluation metrics:
The try-except block ensures that errors in processing individual transects don't halt the entire analysis, allowing the code to continue with the next transect and maximize data utilization.
By accumulating coherence estimates from all available transects, we build a statistically robust picture of the scale-dependent relationships between these different measurement systems.
### 6.2.6 Coherence computation and storage
segments_coh_mvp_swot = compute_segments_coherence(df_mvp['dhc'].values,
adt_swot, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, co in segments_coh_mvp_swot:
global_freq_coh_mvp_swot.append(f)
global_coh_mvp_swot.append(co)
segments_coh_mvp_duacs = compute_segments_coherence(df_mvp['dhc'].values,
adt_duacs, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, co in segments_coh_mvp_duacs:
global_freq_coh_mvp_duacs.append(f)
global_coh_mvp_duacs.append(co)
segments_coh_swot_duacs = compute_segments_coherence(adt_swot, adt_duacs,
max_nan=4, fs=1.0,
min_length=MIN_SEGMENT_LENGTH)
for f, co in segments_coh_swot_duacs:
global_freq_coh_swot_duacs.append(f)
global_coh_swot_duacs.append(co)
except Exception as e:
print(f"Error in PL{PL}: {e}")
continue
This section implements the statistical methods needed to combine spectral estimates from multiple segments and transects into robust ensemble averages. Ensemble averaging is a fundamental technique in spectral analysis that reduces statistical variance while preserving the underlying spectral characteristics.
The code defines two parallel helper functions:
ensemble_average_psd: Combines multiple PSD estimates into a single ensemble-averaged spectrumensemble_average_coherence: Similarly combines multiple coherence estimatesBoth functions follow the same core methodology:
The interpolation step is particularly important because:
Mathematically, the ensemble averaging process can be represented as:
$$\bar{P}(f) = \frac{1}{K} \sum_{i=1}^{K} P_i(f)$$Where:
This ensemble averaging approach provides several key benefits:
By applying these ensemble averaging methods to our accumulated spectral and coherence estimates, we can generate statistically robust results that accurately characterize the spectral properties of our oceanographic measurements.
## 7.1 Ensemble-average helpers
def ensemble_average_psd(freq_list, psd_list):
"""
Return common-grid frequency and ensemble-mean PSD.
"""
if len(freq_list) == 0:
return None, None
freq_common = freq_list[0][freq_list[0] > 0]
psd_interp_all = [psd_list[0][freq_list[0] > 0]]
for freq_, psd_ in zip(freq_list[1:], psd_list[1:]):
valid = freq_ > 0
psd_interp = psd_ if np.array_equal(freq_[valid], freq_common) else \
np.interp(freq_common, freq_[valid], psd_[valid],
left=np.nan, right=np.nan)
psd_interp_all.append(psd_interp)
return freq_common, np.nanmean(np.array(psd_interp_all), axis=0)
def ensemble_average_coherence(freq_list, coh_list):
"""
Return common-grid frequency and ensemble-mean coherence.
"""
if len(freq_list) == 0:
return None, None
freq_common = freq_list[0][freq_list[0] > 0]
coh_interp_all = [coh_list[0][freq_list[0] > 0]]
for freq_, coh_ in zip(freq_list[1:], coh_list[1:]):
valid = freq_ > 0
coh_interp = coh_ if np.array_equal(freq_[valid], freq_common) else \
np.interp(freq_common, freq_[valid], coh_[valid],
left=np.nan, right=np.nan)
coh_interp_all.append(coh_interp)
return freq_common, np.nanmean(np.array(coh_interp_all), axis=0)
After processing all transects, this section applies the ensemble averaging methods to our accumulated spectral estimates, generating the final power spectral density (PSD) and coherence spectra that will form the basis of our analysis.
The ensemble averaging is applied to six distinct spectral quantities:
Three PSD estimates:
Three coherence relationships:
Each ensemble average combines results from multiple segments across all available transects, providing statistically robust spectral estimates that represent the general characteristics of the Northwestern Mediterranean region.
The final conversion from m² to cm² (multiplying by 10⁴) is an important scaling step that:
This unit conversion doesn't affect the spectral shape or coherence values, only the absolute magnitude of the PSD.
These ensemble-averaged spectra enable several key analyses:
The results will provide comprehensive insight into SWOT's ability to resolve oceanographic features across different spatial scales, particularly in the submesoscale range that has traditionally been challenging for satellite altimetry.
## 7.2 Ensemble PSD and coherence computation
freq_mvp_ens, psd_mvp_ens = ensemble_average_psd(global_freq_mvp, global_psd_mvp)
freq_swot_ens, psd_swot_ens = ensemble_average_psd(global_freq_swot, global_psd_swot)
freq_duacs_ens, psd_duacs_ens = ensemble_average_psd(global_freq_duacs, global_psd_duacs)
freq_coh_ens_mvp_swot, coh_ens_mvp_swot = ensemble_average_coherence(global_freq_coh_mvp_swot, global_coh_mvp_swot)
freq_coh_ens_mvp_duacs, coh_ens_mvp_duacs = ensemble_average_coherence(global_freq_coh_mvp_duacs, global_coh_mvp_duacs)
freq_coh_ens_swot_duacs, coh_ens_swot_duacs = ensemble_average_coherence(global_freq_coh_swot_duacs, global_coh_swot_duacs)
# Convert m² (cyc km)⁻¹ → cm² (cyc km)⁻¹
if psd_mvp_ens is not None: psd_mvp_ens *= 1.0e4
if psd_swot_ens is not None: psd_swot_ens *= 1.0e4
if psd_duacs_ens is not None: psd_duacs_ens *= 1.0e4
The visualization section begins with the creation of a comprehensive figure that will display the power spectral density (PSD) and coherence results. This figure is designed to provide a complete picture of the spectral characteristics of our three measurement systems across different spatial scales.
The code initializes a figure with dimensions 8×6 inches, providing sufficient space to clearly display the complex spectral information while maintaining readability. A single subplot is created to contain all spectral curves, coherence relationships, and background elements.
This unified presentation approach offers several advantages:
The figure will incorporate multiple elements:
This integrated visualization approach is particularly valuable for spectral analysis, as it allows us to simultaneously assess energy distribution, scaling behavior, and inter-dataset relationships across the full range of spatial scales represented in our data.
### 8.1.1 Figure and Axes
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
This visualization element establishes a clear reference framework for interpreting the spectral results by dividing the frequency domain into three distinct oceanographic scale regimes. These colored background panels serve as both a visual guide and a conceptual framework, helping to contextualize the spectral characteristics within established oceanographic paradigms.
The three scale regimes are:
Large scale (>100 km) - Yellow background:
Mesoscale (10-100 km) - Green background:
Small scale (<10 km) - Red background:
These scale definitions align with established oceanographic conventions and focus attention on the critical mesoscale-to-submesoscale transition region (around 10-30 km) where SWOT is expected to show significant improvements over conventional altimetry.
By using subtle background shading (alpha=0.1) with low z-order (zorder=0), the code ensures these reference panels don't interfere with the spectral curves while providing important context for interpreting the results. This visual framework will help readers immediately recognize which features fall within each dynamical regime and facilitate discussion of scale-dependent satellite performance.
### 8.1.2 Spatial‐scale Background
ax.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
ax.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
ax.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')
This section defines a utility function for plotting power spectral density curves and establishes a consistent color scheme for the different datasets. These elements ensure visual clarity and consistency throughout the visualization.
The plot_psd_line function serves several important purposes:
The color selection uses a scientific approach based on the viridis colormap, which is designed to:
By sampling three points from this colormap (at positions 0.3, 0.5, and 0.7), the code creates a visually harmonious set of colors that clearly differentiate the three datasets while maintaining a cohesive aesthetic.
The resulting color assignments are:
color_mvp: For in-situ MVP measurements (perceptually the "first" color)color_swot: For SWOT satellite data (perceptually the "middle" color)color_duacs: For DUACS multi-mission altimetry (perceptually the "last" color)This systematic approach to color selection enhances the readability of the complex multi-curve visualization while ensuring aesthetic quality, accessibility, and scientific rigor.
### 8.1.3 Helper for PSD Lines
def plot_psd_line(freq, psd, color, label):
if freq is not None and psd is not None:
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
if np.sum(valid) < 2:
return None
line, = ax.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
return line
return None
colors = plt.cm.viridis(np.linspace(0.3, 0.7, 3)) # colour map sampling
color_mvp, color_swot, color_duacs = colors[0], colors[1], colors[2]
This section plots the ensemble-averaged power spectral density curves for all three measurement systems. These curves represent the core results of our spectral analysis, revealing how energy is distributed across different spatial scales in each dataset.
The code uses the previously defined plot_psd_line helper function to plot three distinct spectral curves:
MVP PSD (in the first viridis color):
SWOT PSD (in the middle viridis color):
DUACS PSD (in the third viridis color):
These PSD curves are plotted on logarithmic axes (using loglog), which is the standard approach for spectral analysis because:
By examining these curves, we can assess several key aspects:
These insights are crucial for understanding SWOT's capabilities and limitations in capturing oceanographic features across different spatial scales.
### 8.1.4 PSD Curves
line_mvp = plot_psd_line(freq_mvp_ens, psd_mvp_ens, color_mvp, 'MVP PSD')
line_swot = plot_psd_line(freq_swot_ens, psd_swot_ens, color_swot, 'SWOT PSD')
line_duacs = plot_psd_line(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')
This section enhances the visualization by adding best-fit power-law trend lines to each PSD curve. These regression lines quantify the spectral slopes, which are key indicators of the underlying physical processes and energy cascade characteristics in the ocean.
For each dataset (MVP, SWOT, and DUACS), the code:
loglog_regression functionIn oceanographic spectral analysis, these spectral slopes have specific physical interpretations:
By comparing the fitted slopes across the three datasets, we can assess:
The code handles potential errors by checking for valid data and only plotting regression lines when meaningful fits can be obtained. The resulting handles are collected in handles_slopes for later inclusion in the legend.
These regression lines provide a quantitative foundation for interpreting the spectral results, moving beyond visual comparison to precise characterization of the energy distribution across scales in each measurement system.
### 8.1.5 Log–log Regression Lines
handles_slopes = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
m_mvp, c_mvp = loglog_regression(freq_mvp_ens, psd_mvp_ens)
if not np.isnan(m_mvp):
reg_line_mvp, = ax.loglog(freq_mvp_ens, 10**c_mvp * freq_mvp_ens**m_mvp,
color=color_mvp, linestyle='--',
label=f"MVP slope = {m_mvp:.2f}")
handles_slopes.append(reg_line_mvp)
if freq_swot_ens is not None and psd_swot_ens is not None:
m_swot, c_swot = loglog_regression(freq_swot_ens, psd_swot_ens)
if not np.isnan(m_swot):
reg_line_swot, = ax.loglog(freq_swot_ens, 10**c_swot * freq_swot_ens**m_swot,
color=color_swot, linestyle='--',
label=f"SWOT slope = {m_swot:.2f}")
handles_slopes.append(reg_line_swot)
if freq_duacs_ens is not None and psd_duacs_ens is not None:
m_duacs, c_duacs = loglog_regression(freq_duacs_ens, psd_duacs_ens)
if not np.isnan(m_duacs):
reg_line_duacs, = ax.loglog(freq_duacs_ens, 10**c_duacs * freq_duacs_ens**m_duacs,
color=color_duacs, linestyle='--',
label=f"DUACS slope = {m_duacs:.2f}")
handles_slopes.append(reg_line_duacs)
This section adds theoretical reference slopes to the plot, providing important context for interpreting the observed spectral characteristics in relation to established oceanographic turbulence theories.
The code adds two reference lines with specific slopes:
Slope -3 (black dotted line):
Slope -2 (gray dotted line):
These reference lines are carefully positioned to align with the observed data:
By including these theoretical references, the visualization enables direct comparison between observed spectral slopes and theoretical predictions. This comparison helps assess:
This approach connects the empirical spectral analysis to fundamental oceanographic theory, providing a deeper physical interpretation of the results. The reference lines are particularly valuable for evaluating whether SWOT captures the theoretically expected energy cascade characteristics across the mesoscale-to-submesoscale transition.
### 8.1.6 Reference‐slope Lines
f_theory = np.linspace(1e-3, 1, 200)
ref_val = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-3),
color='black', linestyle=':', label="Ref slope -3")
theory_line_m2, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-2),
color='gray', linestyle=':', label="Ref slope -2")
handles_slopes.extend([theory_line_m3, theory_line_m2])
This section adds a secondary x-axis that displays spatial wavelength instead of frequency, making the plot more intuitive for oceanographers who typically think in terms of feature size rather than spatial frequency.
While the primary x-axis represents frequency in cycles per kilometer, the secondary axis shows the corresponding wavelength in kilometers. These are inversely related:
$$\text{Wavelength} = \frac{1}{\text{Frequency}}$$The implementation uses matplotlib's secondary_xaxis feature with conversion functions:
lambda f: 1/flambda w: 1/wThis dual representation offers several advantages:
Intuitive interpretation: Oceanographers can directly read off the spatial scales of features (e.g., "20 km mesoscale eddies" rather than "0.05 cycles/km")
Physical relevance: Wavelength directly corresponds to the physical size of oceanographic structures, making it easier to relate spectral features to real-world phenomena
Comparison with theory: Many oceanographic theories and models are expressed in terms of characteristic length scales rather than frequencies
The secondary axis is positioned at the top of the plot, with a clear label "Wavelength (km)" to distinguish it from the primary frequency axis. This arrangement allows readers to seamlessly translate between frequency and wavelength domains while interpreting the spectral results.
With this dual-axis approach, the visualization becomes more accessible to oceanographers from different backgrounds, whether they're more comfortable thinking in terms of frequencies (common in signal processing) or wavelengths (common in physical oceanography).
### 8.1.7 Secondary X-axis (Wavelength)
secax = ax.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax.set_xlabel('Wavelength (km)')
This section finalizes the core axes formatting, ensuring the figure is scientifically accurate, visually clear, and properly labeled. These formatting choices are crucial for effective communication of the spectral analysis results.
The code applies several important formatting elements:
Axis labels:
Title: "Ensemble PSD + Coherence (MVP, SWOT, DUACS)" - Clearly identifies the content and data sources in the figure
Grid:
A subtle grid is added with which='both' to show grid lines at both major and minor tick marks, using dashed lines with 40% opacity for reference without visual distraction
Logarithmic scales:
Both axes use logarithmic scaling (set_xscale('log') and set_yscale('log')), which is standard for spectral analysis as it:
X-axis limits: The frequency range is set to 10⁻³ to 1 cycles/km, corresponding to wavelengths from 1 km to 1000 km. This range encompasses:
These formatting choices ensure that the spectral results are presented in a scientifically rigorous manner while maintaining visual clarity. The consistent use of units and clear labeling helps readers interpret the complex information contained in the spectral curves and coherence relationships.
### 8.1.8 Axes Formatting
ax.set_xlabel("Frequency (cyc/km)")
ax.set_ylabel("PSD [cm²/cyc/km]")
ax.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS)")
ax.grid(True, which='both', linestyle='--', alpha=0.4)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlim(1e-3, 1)
This section adds a secondary y-axis to display the spectral coherence relationships alongside the power spectra. This dual-axis approach allows direct comparison of both the energy distribution and the scale-dependent linear relationships between different measurement systems.
The secondary y-axis is created using ax.twinx(), which shares the same x-axis but has an independent y-scale. It is configured to display coherence values that range from 0 to 1:
Three coherence relationships are plotted, each with a distinct dark color:
MVP-SWOT coherence (dark blue):
MVP-DUACS coherence (dark green):
SWOT-DUACS coherence (dark red):
For each relationship, the code:
This integrated visualization approach allows several key analyses:
The coherence curves provide crucial context for interpreting the spectral results, revealing not just how energy is distributed across scales in each dataset, but how consistently the different measurement systems capture the same oceanographic signals.
### 8.1.9 Secondary Y-axis for Coherence
ax2 = ax.twinx()
ax2.set_ylim(0, 2)
ax2.set_yticks([0, 0.5, 1, 1.5, 2])
ax2.set_yticklabels(['0', '0.5', '1', '', ''])
ax2.set_ylabel('Coherence (0-1)')
coherence_colors = ['darkblue', 'darkgreen', 'darkred']
handles_coherence = []
# MVP–SWOT coherence
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
valid_coh = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
if np.sum(valid_coh) > 1:
line_coherence_mvp_swot, = ax2.semilogx(freq_coh_ens_mvp_swot[valid_coh],
coh_ens_mvp_swot[valid_coh],
color=coherence_colors[0],
label='Coherence MVP-SWOT',
zorder=3)
handles_coherence.append(line_coherence_mvp_swot)
# MVP–DUACS coherence
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
valid_coh = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
if np.sum(valid_coh) > 1:
line_coherence_mvp_duacs, = ax2.semilogx(freq_coh_ens_mvp_duacs[valid_coh],
coh_ens_mvp_duacs[valid_coh],
color=coherence_colors[1],
label='Coherence MVP-DUACS',
zorder=3)
handles_coherence.append(line_coherence_mvp_duacs)
# SWOT–DUACS coherence
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
valid_coh = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
if np.sum(valid_coh) > 1:
line_coherence_swot_duacs, = ax2.semilogx(freq_coh_ens_swot_duacs[valid_coh],
coh_ens_swot_duacs[valid_coh],
color=coherence_colors[2],
label='Coherence SWOT-DUACS',
zorder=3)
handles_coherence.append(line_coherence_swot_duacs)
This final section creates a comprehensive legend that unifies all elements of the visualization, ensuring clear identification of each component while maintaining an organized and readable presentation.
The legend construction follows a systematic approach:
Element collection: The code gathers handles from all visual elements:
Legend positioning:
loc='lower center')bbox_to_anchor=(0.5, -0.25))ncol=3) for compact presentationbottom=0.3)Layout adjustment:
tight_layout() optimizes the use of figure spacesubplots_adjust(bottom=0.3) ensures room for the legend below the plotThis unified legend approach provides several benefits:
The resulting figure presents a complete and integrated view of the spectral characteristics of MVP, SWOT, and DUACS measurements, with clear identification of all components. This visualization enables comprehensive analysis of how well SWOT captures the spectral features observed in in-situ measurements across different spatial scales, with particular focus on the mesoscale-to-submesoscale transition region that is critical for understanding ocean dynamics.
Through this integrated presentation of power spectra, spectral slopes, and coherence relationships, the figure provides a definitive assessment of SWOT's spectral performance relative to both in-situ measurements and conventional altimetry.
### 8.1.10 Legend (all elements)
handles_curves = [h for h in [line_mvp, line_swot, line_duacs] if h is not None]
patches_fondo = [obj for obj in ax.patches] # shading patches
handles_all = patches_fondo + handles_curves + handles_slopes + handles_coherence
labels_all = [obj.get_label() for obj in handles_all]
fig.legend(handles_all, labels_all, loc='lower center',
bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)
plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()
This section begins the creation of a second figure that provides a zoomed view of the spectral results, focusing specifically on the higher-frequency range (0.01 to 1 cycles/km) that encompasses the mesoscale and submesoscale regimes. This zoomed perspective allows for more detailed examination of the spectral features in the range where SWOT is expected to show improvements over conventional altimetry.
The code initializes a new figure with dimensions matching the previous comprehensive plot (8×6 inches), ensuring visual consistency between the two perspectives. A single subplot is created to contain the zoomed spectral information.
This focused visualization approach offers several important advantages:
Enhanced detail: By zooming in on the frequency range of greatest interest (corresponding to wavelengths from 1 to 100 km), the figure reveals subtle spectral features that might be difficult to discern in the full-range plot.
Clearer comparison: The zoomed view facilitates more precise comparison of spectral characteristics in the critical mesoscale-to-submesoscale transition region.
Better resolution assessment: It provides a clearer picture of where satellite measurements begin to diverge from in-situ observations, helping to quantify the effective resolution of SWOT.
This complementary zoomed view follows the same overall structure as the comprehensive figure, with spectral curves, regression lines, coherence relationships, and reference elements, but with the x-axis limits adjusted to highlight the frequency range of greatest oceanographic interest.
Together with the full-range figure, this zoomed visualization will provide a complete assessment of SWOT's spectral performance across the full range of relevant spatial scales.
### 8.2.1 Figure and Axes
fig_zoom = plt.figure(figsize=(8,6))
axz = fig_zoom.add_subplot(111)
This section recreates the spatial scale reference panels in the zoomed figure, maintaining the same color scheme and scale definitions as in the comprehensive plot. Although the figure will focus on higher frequencies, including all three scale regimes provides continuity with the previous figure and ensures clear understanding of the scale context.
The three scale regimes are again delineated:
Large scale (>100 km) - Yellow background:
Mesoscale (10-100 km) - Green background:
Small scale (<10 km) - Red background:
Despite the x-axis limits being set to focus on frequencies from 0.01 to 1 cycles/km, including the complete set of background panels maintains visual consistency with the full-range figure and provides important context for interpreting the zoomed spectral features.
The consistent use of colors and scale definitions across both figures helps readers mentally connect the zoomed view to the broader spectral context, facilitating unified interpretation of the results. This approach is particularly valuable for highlighting SWOT's performance in the mesoscale-to-submesoscale transition region while maintaining awareness of how these scales relate to the broader spectrum of oceanographic features.
### 8.2.2 Spatial‐scale Background (zoom)
axz.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
axz.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
axz.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')
This section defines a helper function specifically for the zoomed plot and uses it to add the power spectral density curves for all three measurement systems. The implementation follows the same approach as in the comprehensive figure, ensuring visual consistency while focusing on the higher-frequency range.
The plot_psd_line_zoom function mirrors the functionality of the previous helper:
The same color scheme is maintained for the three datasets:
This consistency in visualization approach between the two figures helps readers make direct connections between the full-range and zoomed perspectives. The zoomed view will highlight several important features that may be harder to discern in the full-range plot:
Spectral divergence points: The frequencies at which satellite measurements begin to deviate from in-situ observations become more apparent
Relative performance: Differences between SWOT and DUACS in capturing higher-frequency features are emphasized
Noise characteristics: The high-frequency noise floor in each dataset becomes more visible
Resolution limits: The effective resolution of each measurement system can be more precisely identified
By maintaining the same visual approach while adjusting the focus to higher frequencies, this zoomed visualization provides enhanced detail in the critical mesoscale-to-submesoscale transition region without sacrificing consistency with the comprehensive spectral overview.
### 8.2.3 Helper for PSD Lines (zoom)
def plot_psd_line_zoom(freq, psd, color, label):
if freq is not None and psd is not None:
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
if np.sum(valid) < 2:
return None
line, = axz.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
return line
return None
line_mvp_z = plot_psd_line_zoom(freq_mvp_ens, psd_mvp_ens, color_mvp, 'MVP PSD')
line_swot_z = plot_psd_line_zoom(freq_swot_ens, psd_swot_ens, color_swot, 'SWOT PSD')
line_duacs_z = plot_psd_line_zoom(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')
This section adds power-law regression lines to the zoomed spectral plot, using the same approach as in the comprehensive figure. These regression lines quantify the spectral slopes, which are key indicators of the underlying physical processes and energy cascade characteristics.
For each dataset (MVP, SWOT, and DUACS), the code:
loglog_regression functionAlthough the same slope values are used as in the full-range plot (since they're calculated over the entire frequency range), their visualization in the zoomed context offers enhanced insight:
Fit quality assessment: The zoomed view makes it easier to assess how well the power-law model fits the actual data in the mesoscale and submesoscale ranges
Slope differences: Variations in slope between different measurement systems become more apparent, highlighting potential systematic differences in how they capture energy across scales
Deviation identification: Points where the actual spectra deviate from the power-law behavior can be more clearly identified, potentially revealing scale-dependent physical processes or measurement artifacts
The regression lines are particularly valuable in the zoomed view because they help quantify the spectral behavior precisely in the frequency range where SWOT is expected to show improvements over conventional altimetry. By directly comparing the slopes of MVP, SWOT, and DUACS spectra in this critical range, we can assess whether SWOT more accurately captures the true energy scaling observed in the in-situ measurements.
These quantitative slope comparisons provide a rigorous metric for evaluating SWOT's performance in resolving the energy cascade characteristics in the mesoscale-to-submesoscale transition region.
### 8.2.4 Regression Lines (zoom)
handles_slopes_z = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
m_mvp_z, c_mvp_z = loglog_regression(freq_mvp_ens, psd_mvp_ens)
if not np.isnan(m_mvp_z):
reg_line_mvp_z, = axz.loglog(freq_mvp_ens, 10**c_mvp_z * freq_mvp_ens**m_mvp_z,
color=color_mvp, linestyle='--',
label=f"MVP slope = {m_mvp_z:.2f}")
handles_slopes_z.append(reg_line_mvp_z)
if freq_swot_ens is not None and psd_swot_ens is not None:
m_swot_z, c_swot_z = loglog_regression(freq_swot_ens, psd_swot_ens)
if not np.isnan(m_swot_z):
reg_line_swot_z, = axz.loglog(freq_swot_ens, 10**c_swot_z * freq_swot_ens**m_swot_z,
color=color_swot, linestyle='--',
label=f"SWOT slope = {m_swot_z:.2f}")
handles_slopes_z.append(reg_line_swot_z)
if freq_duacs_ens is not None and psd_duacs_ens is not None:
m_duacs_z, c_duacs_z = loglog_regression(freq_duacs_ens, psd_duacs_ens)
if not np.isnan(m_duacs_z):
reg_line_duacs_z, = axz.loglog(freq_duacs_ens, 10**c_duacs_z * freq_duacs_ens**m_duacs_z,
color=color_duacs, linestyle='--',
label=f"DUACS slope = {m_duacs_z:.2f}")
handles_slopes_z.append(reg_line_duacs_z)
This section adds the theoretical reference slope lines to the zoomed plot, maintaining consistency with the comprehensive figure while providing important context for interpreting the spectral characteristics in relation to established oceanographic turbulence theories.
The code adds the same two reference lines:
Slope -3 (black dotted line):
Slope -2 (gray dotted line):
These reference lines have particular significance in the zoomed view because:
Theoretical transitions: The mesoscale-to-submesoscale transition region highlighted in the zoomed plot is precisely where oceanographic theory suggests a potential shift from -3 (QG) to -2 (SQG) spectral slopes
Resolution assessment: Comparison with these reference lines helps determine whether SWOT can resolve the scales where such theoretical transitions might occur
Filtering effects: Deviations from theoretical slopes at higher frequencies may indicate filtering effects in satellite processing
The reference lines are scaled to the same amplitude as in the comprehensive figure (80% of the first MVP PSD value) and use the same dotted line style for visual consistency. This consistency helps readers connect observations between the two figures while the zoomed perspective provides enhanced detail for precise comparison in the critical higher-frequency range.
By including these theoretical references in the zoomed view, the visualization enables more detailed assessment of how the observed spectral characteristics align with established oceanographic theories at the scales most relevant for evaluating SWOT's performance.
### 8.2.5 Reference‐slope Lines (zoom)
f_theory_z = np.linspace(1e-3, 1, 200)
ref_val_z = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-3),
color='black', linestyle=':', label="Ref slope -3")
theory_line_m2_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-2),
color='gray', linestyle=':', label="Ref slope -2")
handles_slopes_z.extend([theory_line_m3_z, theory_line_m2_z])
This section adds a secondary x-axis to the zoomed plot, displaying spatial wavelength instead of frequency, just as in the comprehensive figure. This dual representation is particularly valuable in the zoomed view, as it helps oceanographers immediately recognize the physical scales of features being examined.
The implementation uses the same approach as before:
secondary_xaxis('top')In the zoomed view, this wavelength axis takes on additional significance:
Key scale identification: The zoomed frequency range of 0.01 to 1 cycles/km corresponds to wavelengths from 1 to 100 km, encompassing:
Resolution assessment: The wavelength axis makes it immediately clear at what physical scales satellite measurements begin to diverge from in-situ observations:
Oceanographic feature context: Specific wavelengths can be directly associated with known oceanographic features:
By maintaining this dual representation in the zoomed view, the figure allows oceanographers to interpret the spectral results directly in terms of the physical scales they're familiar with, enhancing the practical utility of the analysis for understanding SWOT's capabilities in resolving features of different sizes.
### 8.2.6 Secondary X-axis (wavelength, zoom)
secax_z = axz.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax_z.set_xlabel('Wavelength (km)')
This section applies final formatting to the zoomed figure, maintaining consistency with the comprehensive plot while adjusting the view to focus on higher frequencies. The careful formatting ensures the zoomed plot maintains scientific accuracy while highlighting the critical mesoscale and submesoscale regimes.
The code applies similar formatting as the comprehensive figure:
Axis labels:
Title: "Ensemble PSD + Coherence (MVP, SWOT, DUACS) – Zoom" - Clearly identifies this as a zoomed version of the previous plot
Grid: The same subtle grid with dashed lines at 40% opacity is applied, providing reference lines without visual distraction
Logarithmic scales: Both axes maintain logarithmic scaling for consistent representation of spectral characteristics
X-axis limits: The key difference is here - the frequency range is set to 0.01 to 1 cycles/km, corresponding to wavelengths from 1 km to 100 km. This zoomed range:
This focused view provides enhanced detail in the regions of greatest interest for evaluating SWOT's performance while maintaining visual and conceptual consistency with the comprehensive figure. By zooming in on these specific scales, the plot enables more precise assessment of:
These insights are crucial for quantifying SWOT's effective resolution and its ability to capture the dynamics of mesoscale and submesoscale oceanographic features.
### 8.2.7 Axes Formatting (zoom)
axz.set_xlabel("Frequency (cyc/km)")
axz.set_ylabel("PSD [cm²/cyc/km]")
axz.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS) – Zoom")
axz.grid(True, which='both', linestyle='--', alpha=0.4)
axz.set_xscale('log')
axz.set_yscale('log')
axz.set_xlim(0.01, 1)
This section adds a secondary y-axis to display coherence relationships in the zoomed figure, following the same approach as in the comprehensive plot. This dual-axis visualization enables simultaneous assessment of both energy distribution and linear relationships between measurement systems, with enhanced focus on the higher-frequency range.
The secondary y-axis is configured identically to the comprehensive figure:
axz.twinx() to share the x-axis while having an independent y-scaleThe same three coherence relationships are plotted with consistent dark colors:
MVP-SWOT coherence (dark blue):
MVP-DUACS coherence (dark green):
SWOT-DUACS coherence (dark red):
In the zoomed view, these coherence curves take on enhanced significance because:
Resolution limits: The frequency at which coherence drops below a threshold value (e.g., 0.6) can define the effective resolution of satellite measurements
Relative performance: Differences in coherence between SWOT and DUACS become more apparent, particularly at higher frequencies
Noise identification: Rapid coherence drops may indicate the scale at which noise begins to dominate the satellite measurements
The zoomed perspective allows more precise identification of these transition points, providing crucial insight into SWOT's capability to resolve finer-scale features compared to conventional altimetry. By examining where coherence begins to decline between MVP and satellite measurements, we can quantify SWOT's effective resolution in capturing real oceanographic signals.
### 8.2.8 Secondary Y-axis for Coherence (zoom)
ax2z = axz.twinx()
ax2z.set_ylim(0, 2)
ax2z.set_yticks([0, 0.5, 1, 1.5, 2])
ax2z.set_yticklabels(['0', '0.5', '1', '', ''])
ax2z.set_ylabel('Coherence (0-1)')
handles_coherence_z = []
# MVP–SWOT coherence (zoom)
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
valid_coh_z = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
if np.sum(valid_coh_z) > 1:
line_coherence_mvp_swot_z, = ax2z.semilogx(freq_coh_ens_mvp_swot[valid_coh_z],
coh_ens_mvp_swot[valid_coh_z],
color=coherence_colors[0],
label='Coherence MVP-SWOT',
zorder=3)
handles_coherence_z.append(line_coherence_mvp_swot_z)
# MVP–DUACS coherence (zoom)
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
valid_coh_z = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
if np.sum(valid_coh_z) > 1:
line_coherence_mvp_duacs_z, = ax2z.semilogx(freq_coh_ens_mvp_duacs[valid_coh_z],
coh_ens_mvp_duacs[valid_coh_z],
color=coherence_colors[1],
label='Coherence MVP-DUACS',
zorder=3)
handles_coherence_z.append(line_coherence_mvp_duacs_z)
# SWOT–DUACS coherence (zoom)
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
valid_coh_z = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
if np.sum(valid_coh_z) > 1:
line_coherence_swot_duacs_z, = ax2z.semilogx(freq_coh_ens_swot_duacs[valid_coh_z],
coh_ens_swot_duacs[valid_coh_z],
color=coherence_colors[2],
label='Coherence SWOT-DUACS',
zorder=3)
handles_coherence_z.append(line_coherence_swot_duacs_z)
This final section completes the zoomed figure by adding a comprehensive legend that identifies all visual elements, following the same approach as in the full-range plot. This ensures consistency between the two figures while providing clear reference for interpreting the zoomed spectral results.
The legend construction mirrors the previous approach:
Element collection: The code gathers handles from all visual components:
Legend positioning:
loc='lower center')bbox_to_anchor=(0.5, -0.25))ncol=3) for compact presentationbottom=0.3)Layout optimization:
tight_layout() maximizes use of figure spacesubplots_adjust(bottom=0.3) ensures room for the legendThis consistent approach to legend construction ensures that readers can easily interpret both figures using the same visual reference system. The zoomed figure is then displayed with plt.show(), completing the visualization process.
Together, the comprehensive and zoomed figures provide a complete spectral analysis of SWOT performance:
This dual-visualization approach enables both qualitative assessment and quantitative analysis of how well SWOT captures oceanographic features across different spatial scales. The zoomed perspective, in particular, allows precise identification of:
These insights are crucial for understanding SWOT's capabilities and limitations in resolving fine-scale oceanographic features, with direct implications for oceanographic research and operational applications.
### 8.2.9 Legend (zoom)
patches_fondo_z = [obj for obj in axz.patches]
handles_curves_z = [h for h in [line_mvp_z, line_swot_z, line_duacs_z] if h is not None]
handles_all_zoom = patches_fondo_z + handles_curves_z + handles_slopes_z + handles_coherence_z
labels_all_zoom = [obj.get_label() for obj in handles_all_zoom]
fig_zoom.legend(handles_all_zoom, labels_all_zoom, loc='lower center',
bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)
plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()
# 1. IMPORT LIBRARIES
## 1.1 Core scientific stack
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.signal as signal
import warnings
import gsw
import scipy.io as sio
import geopy.distance
from datetime import datetime, timedelta
import glob, os
from scipy.interpolate import griddata
from tqdm import tqdm
import xarray as xr
warnings.filterwarnings("ignore", category=RuntimeWarning)
# 2. CONFIGURATION PARAMETERS
MIN_SEGMENT_LENGTH = 10 # minimum points to retain a segment
# 3. DATE UTILITIES
## 3.1 MATLAB ordinal‐date conversion
def read_mat_datetime(date_array):
"""
Convert MATLAB ordinal dates in 'date_array' to Python datetime objects.
"""
dates_list = []
array_flat = date_array.reshape(-1)
for val_ in array_flat:
vv = val_
while isinstance(vv, (list, tuple, np.ndarray)):
if isinstance(vv, np.ndarray):
vv = vv.item() if vv.size == 1 else vv.ravel()[0]
elif isinstance(vv, tuple):
vv = vv[0] if vv else None
if isinstance(vv, (int, float)):
frac = vv % 1
try:
dt_ = datetime.fromordinal(int(vv)) + timedelta(days=frac) - timedelta(days=366)
dates_list.append(dt_)
except:
pass
return dates_list
# 4. DATA LOADING FUNCTIONS
## 4.1 Generic SSH/U/V reader
def loadsshuv(fname, rep, varn, **kwargs):
"""
Load longitude, latitude, SSH, velocity, and ADT fields from a NetCDF file.
"""
if 'type' in kwargs and kwargs['type'] == 'swot':
filei2 = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei2)
if 'area' in kwargs:
area = kwargs['area']
selection = ((ds.longitude > area[0]) &
(ds.longitude < area[2]) &
(ds.latitude > area[1]) &
(ds.latitude < area[3]))
ds = ds.where(selection)
lons = ds.longitude.to_masked_array()
lats = ds.latitude.to_masked_array()
us = ds.ugos.to_masked_array()
vs = ds.vgos.to_masked_array()
ssha = ds.ssha_filtered.to_masked_array()
mss = ds.mss.to_masked_array()
mdt = ds.mdt.to_masked_array()
mask_row = np.all(np.isnan(us), axis=1)
u = us[~mask_row, :]
v = vs[~mask_row, :]
lon = lons[~mask_row, :]
lat = lats[~mask_row, :]
ssha = ssha[~mask_row, :]
mss = mss[~mask_row, :]
mdt = mdt[~mask_row, :]
adt = ssha + mdt
else:
filei = glob.glob(rep + fname)[0]
ds = xr.open_dataset(filei)
lon = ds[varn['longitude']][:]
lat = ds[varn['latitude']][:]
ssha = ds[varn['ssh']][:]
adt = ds['adt'][:]
u = ds[varn['u']][:]
v = ds[varn['v']][:]
return {'lon': lon, 'lat': lat, 'ssh': ssha, 'u': u, 'v': v,
'ssha': ssha, 'adt': adt}
## 4.2 SWOT colocation
def load_SWOT(mvp, dayv, swotdir, savefig, **kwargs):
"""
Load SWOT data and interpolate ADT onto MVP points.
"""
domain = [np.floor(np.min(mvp['lon'])) - 0.5,
np.floor(np.min(mvp['lat'])) - 0.5,
np.ceil(np.max(mvp['lon'])) + 0.5,
np.ceil(np.max(mvp['lat'])) + 0.5]
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'ssha_filtered'}
fname = glob.glob(swotdir + 'SWOT_L3_LR_SSH_Expert_*_' +
dayv_dt.strftime('%Y%m%d') + '*_*.nc')
if not fname:
print(f"No SWOT file found for date {dayv}")
return None
swot = loadsshuv(fname[0], '', varn, type='swot', area=domain)
points = np.column_stack((swot['lon'].flatten(), swot['lat'].flatten()))
adtc = griddata(points, swot['adt'].flatten(), (mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtc)
## 4.3 DUACS colocation
def load_DUACS(mvp, dayv, duacsdir, savefig, **kwargs):
"""
Load DUACS data and interpolate ADT onto MVP points.
"""
dayv_dt = datetime.strptime(dayv, "%Y-%m-%d")
varn = {'longitude': 'longitude', 'latitude': 'latitude',
'u': 'ugos', 'v': 'vgos', 'ssh': 'sla'}
fname = glob.glob(duacsdir + 'nrt_europe_allsat_phy_l4_' +
dayv_dt.strftime('%Y%m%d') + '_*.nc')
if not fname:
print(f"No DUACS file found for date {dayv}")
return None
duacs = loadsshuv(fname[0], '', varn)
X, Y = np.meshgrid(duacs['lon'].values, duacs['lat'].values)
points = np.column_stack((X.flatten(), Y.flatten()))
adtd = griddata(points, duacs['adt'].values.flatten(), (mvp['lon'], mvp['lat']), method='linear')
return np.squeeze(adtd)
# 5. SPECTRAL ANALYSIS UTILITIES
## 5.1 Segmentation and power spectral density
### 5.1.1 NaN‐aware splitting
def split_on_nan(data_array, max_consecutive_nans=4):
"""
Split a 1-D array into continuous segments separated by ≥ max_consecutive_nans NaNs.
"""
segments = []
n = len(data_array)
start_idx = 0
nan_count = 0
for i in range(n):
nan_count = nan_count + 1 if np.isnan(data_array[i]) else 0
if nan_count >= max_consecutive_nans:
end_idx = i - max_consecutive_nans
if end_idx > start_idx:
segm = data_array[start_idx:end_idx]
if len(segm) > 1:
segments.append(segm)
start_idx = i
nan_count = 0
if start_idx < n:
segm = data_array[start_idx:]
if len(segm) > 1:
segments.append(segm)
return segments
### 5.1.2 Welch PSD
def welch_psd_1D(data_array, fs=1.0):
"""
Compute 1-D PSD via Welch’s method (Hann window).
"""
n = len(data_array)
if n < 8:
return None, None
nperseg = min(int(n // 2), 256)
freq, psd = signal.welch(data_array, fs=fs, nperseg=nperseg,
noverlap=nperseg // 2, window='hann',
detrend='constant')
return freq, psd
### 5.1.3 PSD for segmented data
def compute_segments_psd(data_array, max_nan=4, fs=1.0, min_length=10):
"""
Compute Welch PSD for each sufficiently long segment of 'data_array'.
"""
segments = split_on_nan(data_array, max_nan)
results = []
for seg in segments:
seg_clean = seg[~np.isnan(seg)]
if len(seg_clean) < min_length:
continue
f, p = welch_psd_1D(seg_clean, fs)
if f is not None and p is not None:
results.append((f, p))
return results
### 5.1.4 Log–log regression of PSD
def loglog_regression(freq, psd):
"""
Fit log10(PSD) = m log10(freq) + c and return slope m, intercept c.
"""
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0) & (freq > 0)
if np.sum(valid) < 2:
return np.nan, np.nan
m, c = np.polyfit(np.log10(freq[valid]), np.log10(psd[valid]), 1)
return m, c
## 5.2 Spectral coherence
### 5.2.1 Paired NaN‐aware splitting
def split_on_nan_pair(data1, data2, max_consecutive_nans=4):
"""
Split paired arrays into segments, honouring NaNs in either input.
"""
data1 = np.asarray(data1)
data2 = np.asarray(data2)
combined_nan = np.isnan(data1) | np.isnan(data2)
segments = []
n = len(data1)
start_idx = 0
nan_count = 0
for i in range(n):
nan_count = nan_count + 1 if combined_nan[i] else 0
if nan_count >= max_consecutive_nans:
end_idx = i - max_consecutive_nans
if end_idx > start_idx:
segm1 = data1[start_idx:end_idx]
segm2 = data2[start_idx:end_idx]
if len(segm1) > 1:
segments.append((segm1, segm2))
start_idx = i
nan_count = 0
if start_idx < n:
segm1 = data1[start_idx:]
segm2 = data2[start_idx:]
if len(segm1) > 1:
segments.append((segm1, segm2))
return segments
### 5.2.2 Welch coherence
def welch_coherence_1D(signal1, signal2, fs=1.0):
"""
Compute Welch spectral coherence between two signals.
"""
n = len(signal1)
if n < 8:
return None, None
nperseg = min(int(n // 2), 256)
freq, coh = signal.coherence(signal1, signal2, fs=fs, window='hann',
nperseg=nperseg, noverlap=nperseg // 2,
detrend='constant')
return freq, coh
### 5.2.3 Coherence for segmented data
def compute_segments_coherence(data1, data2, max_nan=4, fs=1.0, min_length=10):
"""
Compute Welch coherence for each paired segment of (data1, data2).
"""
segments = split_on_nan_pair(data1, data2, max_nan)
results = []
for seg1, seg2 in segments:
mask_valid = (~np.isnan(seg1)) & (~np.isnan(seg2))
seg1_clean = seg1[mask_valid]
seg2_clean = seg2[mask_valid]
if len(seg1_clean) < min_length:
continue
f, co = welch_coherence_1D(seg1_clean, seg2_clean, fs=fs)
if f is not None and co is not None:
results.append((f, co))
return results
# 6. LOOP OVER PROFILING LINES
## 6.1 Initialisation of accumulators
PLs = np.arange(5, 25) # profiling lines 5 – 24
# Power-spectral-density (PSD) containers
global_freq_mvp, global_psd_mvp = [], []
global_freq_swot, global_psd_swot = [], []
global_freq_duacs, global_psd_duacs = [], []
# Spectral-coherence containers
global_freq_coh_mvp_swot, global_coh_mvp_swot = [], []
global_freq_coh_mvp_duacs, global_coh_mvp_duacs = [], []
global_freq_coh_swot_duacs, global_coh_swot_duacs = [], []
## 6.2 Iterative processing
for PL in PLs:
print(f"\n[SPECTRAL ANALYSIS] – PL{PL} in progress...")
try:
# ### 6.2.1 MVP data loading and dynamic-height computation
mat_file = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2/PL{PL}.mat')
date_array = mat_file['metadata']['date']
dates_mvp = read_mat_datetime(date_array)
if len(dates_mvp) == 0:
raise ValueError(f"No valid date in PL{PL}")
mat = sio.loadmat(f'/home/bessiere/Documents/MVP_TOOLS/BIOSWOTMED/MVP/L2_python/PL{PL}_python.mat')
lon_mvp = mat['lon']
lat_mvp = mat['lat']
p_mvp = mat['pres']
t_mvp = mat['temp']
SP_mvp = mat['sal']
SA = gsw.SA_from_SP(SP_mvp, p_mvp, lon_mvp.T, lat_mvp.T)
CT = gsw.CT_from_t(SA, t_mvp, p_mvp)
g_ = 9.81
dh_list = []
for i in range(SA.shape[0]):
valid_ = ~np.isnan(SA[i])
if np.sum(valid_) == 0:
dh_list.append(np.nan)
continue
p_ref = np.nanmax(p_mvp[i][valid_]) # p_ref = p_max
if np.isnan(p_ref):
dh_list.append(np.nan)
continue
dh_i = gsw.geo_strf_dyn_height(SA[i][valid_], CT[i][valid_],
p_mvp[i][valid_], p_ref=p_ref)[0] / g_
if isinstance(dh_i, np.ndarray):
dh_i = dh_i.item() if dh_i.size == 1 else dh_i[0]
dh_list.append(dh_i)
dh_array = np.array(dh_list, dtype=float)
if np.all(np.isnan(dh_array)):
raise ValueError(f"PL{PL}: All dh are NaN")
dhc = dh_array - np.nanmean(dh_array) # centred anomaly
# ### 6.2.2 MVP transect dataframe and distance along track
n_profiles = SA.shape[0]
if lon_mvp.shape[1] < n_profiles:
raise ValueError(f"PL{PL}: lon_mvp dimension mismatch")
df_mvp_raw = pd.DataFrame({
'dh': dh_array,
'dhc': dhc,
'lon': lon_mvp[0, :n_profiles],
'lat': lat_mvp[0, :n_profiles]
})
dist_along = np.zeros(n_profiles)
for i in range(1, n_profiles):
dist_along[i] = dist_along[i-1] + geopy.distance.distance(
(df_mvp_raw['lat'].values[i-1], df_mvp_raw['lon'].values[i-1]),
(df_mvp_raw['lat'].values[i], df_mvp_raw['lon'].values[i])
).km
df_mvp_raw['dist'] = dist_along
# ### 6.2.3 Regular 1 km re-sampling (no smoothing)
step_km = 1
dist_max = df_mvp_raw['dist'].iloc[-1]
dist_reg = np.arange(0, dist_max + step_km, step_km)
lon_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lon'])
lat_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['lat'])
dh_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dh'])
dhc_reg = np.interp(dist_reg, df_mvp_raw['dist'], df_mvp_raw['dhc'])
df_mvp = pd.DataFrame({
'dist': dist_reg,
'lon': lon_reg,
'lat': lat_reg,
'dh': dh_reg,
'dhc': dhc_reg
})
# ### 6.2.4 External ADT colocation (SWOT & DUACS)
dayv_str = dates_mvp[0].strftime('%Y-%m-%d')
adt_swot = load_SWOT(df_mvp, dayv_str,
'/home/favaloro/Téléchargements/SWOT_V2.0.0/v2.0.0/', '')
adt_swot = np.full(len(df_mvp), np.nan) if adt_swot is None else adt_swot - np.nanmean(adt_swot)
adt_duacs = load_DUACS(df_mvp, dayv_str,
'/home/bessiere/Documents/DUACS/', '')
adt_duacs = np.full(len(df_mvp), np.nan) if adt_duacs is None else adt_duacs - np.nanmean(adt_duacs)
# ### 6.2.5 PSD computation and storage
segments_psd_mvp = compute_segments_psd(df_mvp['dhc'].values, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, p in segments_psd_mvp:
global_freq_mvp.append(f)
global_psd_mvp.append(p)
segments_psd_swot = compute_segments_psd(adt_swot, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, p in segments_psd_swot:
global_freq_swot.append(f)
global_psd_swot.append(p)
segments_psd_duacs = compute_segments_psd(adt_duacs, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, p in segments_psd_duacs:
global_freq_duacs.append(f)
global_psd_duacs.append(p)
# ### 6.2.6 Coherence computation and storage
segments_coh_mvp_swot = compute_segments_coherence(df_mvp['dhc'].values,
adt_swot, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, co in segments_coh_mvp_swot:
global_freq_coh_mvp_swot.append(f)
global_coh_mvp_swot.append(co)
segments_coh_mvp_duacs = compute_segments_coherence(df_mvp['dhc'].values,
adt_duacs, max_nan=4,
fs=1.0, min_length=MIN_SEGMENT_LENGTH)
for f, co in segments_coh_mvp_duacs:
global_freq_coh_mvp_duacs.append(f)
global_coh_mvp_duacs.append(co)
segments_coh_swot_duacs = compute_segments_coherence(adt_swot, adt_duacs,
max_nan=4, fs=1.0,
min_length=MIN_SEGMENT_LENGTH)
for f, co in segments_coh_swot_duacs:
global_freq_coh_swot_duacs.append(f)
global_coh_swot_duacs.append(co)
except Exception as e:
print(f"Error in PL{PL}: {e}")
continue
# 7. ENSEMBLE AVERAGING
## 7.1 Ensemble-average helpers
def ensemble_average_psd(freq_list, psd_list):
"""
Return common-grid frequency and ensemble-mean PSD.
"""
if len(freq_list) == 0:
return None, None
freq_common = freq_list[0][freq_list[0] > 0]
psd_interp_all = [psd_list[0][freq_list[0] > 0]]
for freq_, psd_ in zip(freq_list[1:], psd_list[1:]):
valid = freq_ > 0
psd_interp = psd_ if np.array_equal(freq_[valid], freq_common) else \
np.interp(freq_common, freq_[valid], psd_[valid],
left=np.nan, right=np.nan)
psd_interp_all.append(psd_interp)
return freq_common, np.nanmean(np.array(psd_interp_all), axis=0)
def ensemble_average_coherence(freq_list, coh_list):
"""
Return common-grid frequency and ensemble-mean coherence.
"""
if len(freq_list) == 0:
return None, None
freq_common = freq_list[0][freq_list[0] > 0]
coh_interp_all = [coh_list[0][freq_list[0] > 0]]
for freq_, coh_ in zip(freq_list[1:], coh_list[1:]):
valid = freq_ > 0
coh_interp = coh_ if np.array_equal(freq_[valid], freq_common) else \
np.interp(freq_common, freq_[valid], coh_[valid],
left=np.nan, right=np.nan)
coh_interp_all.append(coh_interp)
return freq_common, np.nanmean(np.array(coh_interp_all), axis=0)
## 7.2 Ensemble PSD and coherence computation
freq_mvp_ens, psd_mvp_ens = ensemble_average_psd(global_freq_mvp, global_psd_mvp)
freq_swot_ens, psd_swot_ens = ensemble_average_psd(global_freq_swot, global_psd_swot)
freq_duacs_ens, psd_duacs_ens = ensemble_average_psd(global_freq_duacs, global_psd_duacs)
freq_coh_ens_mvp_swot, coh_ens_mvp_swot = ensemble_average_coherence(global_freq_coh_mvp_swot, global_coh_mvp_swot)
freq_coh_ens_mvp_duacs, coh_ens_mvp_duacs = ensemble_average_coherence(global_freq_coh_mvp_duacs, global_coh_mvp_duacs)
freq_coh_ens_swot_duacs, coh_ens_swot_duacs = ensemble_average_coherence(global_freq_coh_swot_duacs, global_coh_swot_duacs)
# Convert m² (cyc km)⁻¹ → cm² (cyc km)⁻¹
if psd_mvp_ens is not None: psd_mvp_ens *= 1.0e4
if psd_swot_ens is not None: psd_swot_ens *= 1.0e4
if psd_duacs_ens is not None: psd_duacs_ens *= 1.0e4
# 8. VISUALISATION
## 8.1 Complete Figure – Ensemble PSD and Coherence
### 8.1.1 Figure and Axes
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
### 8.1.2 Spatial‐scale Background
ax.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
ax.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
ax.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')
### 8.1.3 Helper for PSD Lines
def plot_psd_line(freq, psd, color, label):
if freq is not None and psd is not None:
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
if np.sum(valid) < 2:
return None
line, = ax.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
return line
return None
colors = plt.cm.viridis(np.linspace(0.3, 0.7, 3)) # colour map sampling
color_mvp, color_swot, color_duacs = colors[0], colors[1], colors[2]
### 8.1.4 PSD Curves
line_mvp = plot_psd_line(freq_mvp_ens, psd_mvp_ens, color_mvp, 'MVP PSD')
line_swot = plot_psd_line(freq_swot_ens, psd_swot_ens, color_swot, 'SWOT PSD')
line_duacs = plot_psd_line(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')
### 8.1.5 Log–log Regression Lines
handles_slopes = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
m_mvp, c_mvp = loglog_regression(freq_mvp_ens, psd_mvp_ens)
if not np.isnan(m_mvp):
reg_line_mvp, = ax.loglog(freq_mvp_ens, 10**c_mvp * freq_mvp_ens**m_mvp,
color=color_mvp, linestyle='--',
label=f"MVP slope = {m_mvp:.2f}")
handles_slopes.append(reg_line_mvp)
if freq_swot_ens is not None and psd_swot_ens is not None:
m_swot, c_swot = loglog_regression(freq_swot_ens, psd_swot_ens)
if not np.isnan(m_swot):
reg_line_swot, = ax.loglog(freq_swot_ens, 10**c_swot * freq_swot_ens**m_swot,
color=color_swot, linestyle='--',
label=f"SWOT slope = {m_swot:.2f}")
handles_slopes.append(reg_line_swot)
if freq_duacs_ens is not None and psd_duacs_ens is not None:
m_duacs, c_duacs = loglog_regression(freq_duacs_ens, psd_duacs_ens)
if not np.isnan(m_duacs):
reg_line_duacs, = ax.loglog(freq_duacs_ens, 10**c_duacs * freq_duacs_ens**m_duacs,
color=color_duacs, linestyle='--',
label=f"DUACS slope = {m_duacs:.2f}")
handles_slopes.append(reg_line_duacs)
### 8.1.6 Reference‐slope Lines
f_theory = np.linspace(1e-3, 1, 200)
ref_val = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-3),
color='black', linestyle=':', label="Ref slope -3")
theory_line_m2, = ax.loglog(f_theory, ref_val * (f_theory/f_theory[0])**(-2),
color='gray', linestyle=':', label="Ref slope -2")
handles_slopes.extend([theory_line_m3, theory_line_m2])
### 8.1.7 Secondary X-axis (Wavelength)
secax = ax.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax.set_xlabel('Wavelength (km)')
### 8.1.8 Axes Formatting
ax.set_xlabel("Frequency (cyc/km)")
ax.set_ylabel("PSD [cm²/cyc/km]")
ax.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS)")
ax.grid(True, which='both', linestyle='--', alpha=0.4)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlim(1e-3, 1)
### 8.1.9 Secondary Y-axis for Coherence
ax2 = ax.twinx()
ax2.set_ylim(0, 2)
ax2.set_yticks([0, 0.5, 1, 1.5, 2])
ax2.set_yticklabels(['0', '0.5', '1', '', ''])
ax2.set_ylabel('Coherence (0-1)')
coherence_colors = ['darkblue', 'darkgreen', 'darkred']
handles_coherence = []
# MVP–SWOT coherence
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
valid_coh = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
if np.sum(valid_coh) > 1:
line_coherence_mvp_swot, = ax2.semilogx(freq_coh_ens_mvp_swot[valid_coh],
coh_ens_mvp_swot[valid_coh],
color=coherence_colors[0],
label='Coherence MVP-SWOT',
zorder=3)
handles_coherence.append(line_coherence_mvp_swot)
# MVP–DUACS coherence
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
valid_coh = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
if np.sum(valid_coh) > 1:
line_coherence_mvp_duacs, = ax2.semilogx(freq_coh_ens_mvp_duacs[valid_coh],
coh_ens_mvp_duacs[valid_coh],
color=coherence_colors[1],
label='Coherence MVP-DUACS',
zorder=3)
handles_coherence.append(line_coherence_mvp_duacs)
# SWOT–DUACS coherence
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
valid_coh = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
if np.sum(valid_coh) > 1:
line_coherence_swot_duacs, = ax2.semilogx(freq_coh_ens_swot_duacs[valid_coh],
coh_ens_swot_duacs[valid_coh],
color=coherence_colors[2],
label='Coherence SWOT-DUACS',
zorder=3)
handles_coherence.append(line_coherence_swot_duacs)
### 8.1.10 Legend (all elements)
handles_curves = [h for h in [line_mvp, line_swot, line_duacs] if h is not None]
patches_fondo = [obj for obj in ax.patches] # shading patches
handles_all = patches_fondo + handles_curves + handles_slopes + handles_coherence
labels_all = [obj.get_label() for obj in handles_all]
fig.legend(handles_all, labels_all, loc='lower center',
bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)
plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()
## 8.2 Zoomed Figure – 0.01 ≤ f ≤ 1 cyc km⁻¹
### 8.2.1 Figure and Axes
fig_zoom = plt.figure(figsize=(8,6))
axz = fig_zoom.add_subplot(111)
### 8.2.2 Spatial‐scale Background (zoom)
axz.axvspan(1e-6, 0.01, color='yellow', alpha=0.1, zorder=0, label='Large scale (>100 km)')
axz.axvspan(0.01, 0.1, color='green', alpha=0.1, zorder=0, label='Mesoscale (10-100 km)')
axz.axvspan(0.1, 1e2, color='red', alpha=0.1, zorder=0, label='Small scale (<10 km)')
### 8.2.3 Helper for PSD Lines (zoom)
def plot_psd_line_zoom(freq, psd, color, label):
if freq is not None and psd is not None:
valid = (~np.isnan(freq)) & (~np.isnan(psd)) & (psd > 0)
if np.sum(valid) < 2:
return None
line, = axz.loglog(freq[valid], psd[valid], color=color, linestyle='-', label=label)
return line
return None
line_mvp_z = plot_psd_line_zoom(freq_mvp_ens, psd_mvp_ens, color_mvp, 'MVP PSD')
line_swot_z = plot_psd_line_zoom(freq_swot_ens, psd_swot_ens, color_swot, 'SWOT PSD')
line_duacs_z = plot_psd_line_zoom(freq_duacs_ens, psd_duacs_ens, color_duacs,'DUACS PSD')
### 8.2.4 Regression Lines (zoom)
handles_slopes_z = []
if freq_mvp_ens is not None and psd_mvp_ens is not None:
m_mvp_z, c_mvp_z = loglog_regression(freq_mvp_ens, psd_mvp_ens)
if not np.isnan(m_mvp_z):
reg_line_mvp_z, = axz.loglog(freq_mvp_ens, 10**c_mvp_z * freq_mvp_ens**m_mvp_z,
color=color_mvp, linestyle='--',
label=f"MVP slope = {m_mvp_z:.2f}")
handles_slopes_z.append(reg_line_mvp_z)
if freq_swot_ens is not None and psd_swot_ens is not None:
m_swot_z, c_swot_z = loglog_regression(freq_swot_ens, psd_swot_ens)
if not np.isnan(m_swot_z):
reg_line_swot_z, = axz.loglog(freq_swot_ens, 10**c_swot_z * freq_swot_ens**m_swot_z,
color=color_swot, linestyle='--',
label=f"SWOT slope = {m_swot_z:.2f}")
handles_slopes_z.append(reg_line_swot_z)
if freq_duacs_ens is not None and psd_duacs_ens is not None:
m_duacs_z, c_duacs_z = loglog_regression(freq_duacs_ens, psd_duacs_ens)
if not np.isnan(m_duacs_z):
reg_line_duacs_z, = axz.loglog(freq_duacs_ens, 10**c_duacs_z * freq_duacs_ens**m_duacs_z,
color=color_duacs, linestyle='--',
label=f"DUACS slope = {m_duacs_z:.2f}")
handles_slopes_z.append(reg_line_duacs_z)
### 8.2.5 Reference‐slope Lines (zoom)
f_theory_z = np.linspace(1e-3, 1, 200)
ref_val_z = psd_mvp_ens[0] * 0.8 if (psd_mvp_ens is not None) and (len(psd_mvp_ens) > 0) else 1e4
theory_line_m3_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-3),
color='black', linestyle=':', label="Ref slope -3")
theory_line_m2_z, = axz.loglog(f_theory_z, ref_val_z * (f_theory_z/f_theory_z[0])**(-2),
color='gray', linestyle=':', label="Ref slope -2")
handles_slopes_z.extend([theory_line_m3_z, theory_line_m2_z])
### 8.2.6 Secondary X-axis (wavelength, zoom)
secax_z = axz.secondary_xaxis('top', functions=(lambda f: 1/f, lambda w: 1/w))
secax_z.set_xlabel('Wavelength (km)')
### 8.2.7 Axes Formatting (zoom)
axz.set_xlabel("Frequency (cyc/km)")
axz.set_ylabel("PSD [cm²/cyc/km]")
axz.set_title("Ensemble PSD + Coherence (MVP, SWOT, DUACS) – Zoom")
axz.grid(True, which='both', linestyle='--', alpha=0.4)
axz.set_xscale('log')
axz.set_yscale('log')
axz.set_xlim(0.01, 1)
### 8.2.8 Secondary Y-axis for Coherence (zoom)
ax2z = axz.twinx()
ax2z.set_ylim(0, 2)
ax2z.set_yticks([0, 0.5, 1, 1.5, 2])
ax2z.set_yticklabels(['0', '0.5', '1', '', ''])
ax2z.set_ylabel('Coherence (0-1)')
handles_coherence_z = []
# MVP–SWOT coherence (zoom)
if freq_coh_ens_mvp_swot is not None and coh_ens_mvp_swot is not None:
valid_coh_z = (~np.isnan(freq_coh_ens_mvp_swot)) & (~np.isnan(coh_ens_mvp_swot)) & (freq_coh_ens_mvp_swot > 0)
if np.sum(valid_coh_z) > 1:
line_coherence_mvp_swot_z, = ax2z.semilogx(freq_coh_ens_mvp_swot[valid_coh_z],
coh_ens_mvp_swot[valid_coh_z],
color=coherence_colors[0],
label='Coherence MVP-SWOT',
zorder=3)
handles_coherence_z.append(line_coherence_mvp_swot_z)
# MVP–DUACS coherence (zoom)
if freq_coh_ens_mvp_duacs is not None and coh_ens_mvp_duacs is not None:
valid_coh_z = (~np.isnan(freq_coh_ens_mvp_duacs)) & (~np.isnan(coh_ens_mvp_duacs)) & (freq_coh_ens_mvp_duacs > 0)
if np.sum(valid_coh_z) > 1:
line_coherence_mvp_duacs_z, = ax2z.semilogx(freq_coh_ens_mvp_duacs[valid_coh_z],
coh_ens_mvp_duacs[valid_coh_z],
color=coherence_colors[1],
label='Coherence MVP-DUACS',
zorder=3)
handles_coherence_z.append(line_coherence_mvp_duacs_z)
# SWOT–DUACS coherence (zoom)
if freq_coh_ens_swot_duacs is not None and coh_ens_swot_duacs is not None:
valid_coh_z = (~np.isnan(freq_coh_ens_swot_duacs)) & (~np.isnan(coh_ens_swot_duacs)) & (freq_coh_ens_swot_duacs > 0)
if np.sum(valid_coh_z) > 1:
line_coherence_swot_duacs_z, = ax2z.semilogx(freq_coh_ens_swot_duacs[valid_coh_z],
coh_ens_swot_duacs[valid_coh_z],
color=coherence_colors[2],
label='Coherence SWOT-DUACS',
zorder=3)
handles_coherence_z.append(line_coherence_swot_duacs_z)
### 8.2.9 Legend (zoom)
patches_fondo_z = [obj for obj in axz.patches]
handles_curves_z = [h for h in [line_mvp_z, line_swot_z, line_duacs_z] if h is not None]
handles_all_zoom = patches_fondo_z + handles_curves_z + handles_slopes_z + handles_coherence_z
labels_all_zoom = [obj.get_label() for obj in handles_all_zoom]
fig_zoom.legend(handles_all_zoom, labels_all_zoom, loc='lower center',
bbox_to_anchor=(0.5, -0.25), ncol=3, borderaxespad=0.)
plt.tight_layout()
plt.subplots_adjust(bottom=0.3)
plt.show()
[SPECTRAL ANALYSIS] - PL5 in progress... [SPECTRAL ANALYSIS] - PL6 in progress... [SPECTRAL ANALYSIS] - PL7 in progress... [SPECTRAL ANALYSIS] - PL8 in progress... [SPECTRAL ANALYSIS] - PL9 in progress... [SPECTRAL ANALYSIS] - PL10 in progress... [SPECTRAL ANALYSIS] - PL11 in progress... [SPECTRAL ANALYSIS] - PL12 in progress... [SPECTRAL ANALYSIS] - PL13 in progress... [SPECTRAL ANALYSIS] - PL14 in progress... [SPECTRAL ANALYSIS] - PL15 in progress... [SPECTRAL ANALYSIS] - PL16 in progress... [SPECTRAL ANALYSIS] - PL17 in progress... [SPECTRAL ANALYSIS] - PL18 in progress... [SPECTRAL ANALYSIS] - PL19 in progress... [SPECTRAL ANALYSIS] - PL20 in progress... [SPECTRAL ANALYSIS] - PL21 in progress... [SPECTRAL ANALYSIS] - PL22 in progress... [SPECTRAL ANALYSIS] - PL23 in progress... [SPECTRAL ANALYSIS] - PL24 in progress...